Getting pandoc to convert Github Markdown documents with HTML tags to PDF

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Getting pandoc to convert Github Markdown documents with HTML tags to PDF
@ 2023-07-03 22:41 Luveh Keraph
       [not found] ` <b1dae07b-11d1-4c98-8fcf-369f2b23a54cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Luveh Keraph @ 2023-07-03 22:41 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1309 bytes --]

I have a Github Markdown document that contains HTML tags - mostly to do 
with special characters (e.g. &Hscr;) and stuff to place pictures where I 
want in the page. The thing is, pandoc seems to ignore the HTML tags. Is 
this a limitation intrinsic to pandoc, or is there any way to get pandoc to 
process such tags and produce the right output? 

The pandoc invocation that I am currently using for converting my Github 
Markdown documents to PDF is

 $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 
--pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css 
-o MyDoc.pdf

The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with 
images properly (in that it sometimes rearranges surrounding paragraphs the 
wrong way) and it seems to be unable to deal with expressions like 
_A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be 
ignored.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1708 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF
       [not found] ` <b1dae07b-11d1-4c98-8fcf-369f2b23a54cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2023-07-04 21:50   ` John MacFarlane
       [not found]     ` <529BC174-779A-4D98-BCC9-F59AEAAC2B9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: John MacFarlane @ 2023-07-04 21:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

HTML tags should be passed through to HTML formats.

Have you looked at the intermediate HTML produced?  You can use --verbose to see it.

This seems to work fine:

% pandoc -t html5
_A_<sub>_m_</sub>
<p><em>A</em><sub><em>m</em></sub></p>

PS. You probably want to use -f gfm if you're targeting GitHub Markdown.

Pandoc version?



> On Jul 3,

> 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> I have a Github Markdown document that contains HTML tags - mostly to do with special characters (e.g. &Hscr;) and stuff to place pictures where I want in the page. The thing is, pandoc seems to ignore the HTML tags. Is this a limitation intrinsic to pandoc, or is there any way to get pandoc to process such tags and produce the right output? 
> 
> The pandoc invocation that I am currently using for converting my Github Markdown documents to PDF is
> 
>  $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css -o MyDoc.pdf
> 
> The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with images properly (in that it sometimes rearranges surrounding paragraphs the wrong way) and it seems to be unable to deal with expressions like _A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be ignored.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF
       [not found]     ` <529BC174-779A-4D98-BCC9-F59AEAAC2B9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2023-07-04 23:07       ` Luveh Keraph
       [not found]         ` <CAFy1yb2op3Aq=P4L7xpNwPBBHtopKMx+urWz+-VQ+5Mh0CM=hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Luveh Keraph @ 2023-07-04 23:07 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3618 bytes --]

Thanks. I invoked pandoc -f gfm MyDoc. -o MyDoc.pdf and in the resulting
PDF document the subscripts are still ignored. When running it with
--verbose in the resulting output I saw numerous instances of

[INFO] Not rendering RawInline (Format "html") "</sub>"
[INFO] Not rendering RawInline (Format "html") "<sub>"

However, when I added -t html5 to the invocation the diagnostics above
disappear, and the subscripts are indeed present in the converted PDF file.
Thanks for the tip - it has indeed improved things. Now it is still the
case that things like &mdash; or &Hscr; are ignored by pandoc. Any
suggestions on how to get pandoc to process them?

I am using the following:

pandoc 3.1.4
Features: +server +lua
Scripting engine: Lua 5.4






On Tue, Jul 4, 2023 at 3:50 PM John MacFarlane <fiddlosopher@gmail.com>
wrote:

> HTML tags should be passed through to HTML formats.
>
> Have you looked at the intermediate HTML produced?  You can use --verbose
> to see it.
>
> This seems to work fine:
>
> % pandoc -t html5
> _A_<sub>_m_</sub>
> <p><em>A</em><sub><em>m</em></sub></p>
>
> PS. You probably want to use -f gfm if you're targeting GitHub Markdown.
>
> Pandoc version?
>
>
>
> > On Jul 3,
>
> > 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> > I have a Github Markdown document that contains HTML tags - mostly to do
> with special characters (e.g. &Hscr;) and stuff to place pictures where I
> want in the page. The thing is, pandoc seems to ignore the HTML tags. Is
> this a limitation intrinsic to pandoc, or is there any way to get pandoc to
> process such tags and produce the right output?
> >
> > The pandoc invocation that I am currently using for converting my Github
> Markdown documents to PDF is
> >
> >  $ pandoc --resource-path=/home/abc/Repos.wiki -t html5
> --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css
> -o MyDoc.pdf
> >
> > The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with
> images properly (in that it sometimes rearranges surrounding paragraphs the
> wrong way) and it seems to be unable to deal with expressions like
> _A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be
> ignored.
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb2op3Aq%3DP4L7xpNwPBBHtopKMx%2BurWz%2B-VQ%2B5Mh0CM%3DhQ%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 5262 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF
       [not found]         ` <CAFy1yb2op3Aq=P4L7xpNwPBBHtopKMx+urWz+-VQ+5Mh0CM=hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2023-07-05 18:41           ` John MacFarlane
       [not found]             ` <F4D52E47-33F8-4A2C-9A56-679BD5240ABD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: John MacFarlane @ 2023-07-05 18:41 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

&mdash; and &Hscr; will be parsed as unicode characters and these will be passed through to the HTML.
You can check the intermediate HTML file (again it will be printed with --verbose) to confirm this.
It may be that the program that is being invoked to go from HTML -> PDF (wkhtmltopdf ?) doesn't handle these characters properly.
You could try adding the `--ascii` option which will force entities to be used.

> On Jul 4, 2023, at 4:07 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> Thanks. I invoked pandoc -f gfm MyDoc. -o MyDoc.pdf and in the resulting PDF document the subscripts are still ignored. When running it with --verbose in the resulting output I saw numerous instances of 
> 
> [INFO] Not rendering RawInline (Format "html") "</sub>"
> [INFO] Not rendering RawInline (Format "html") "<sub>"
> 
> However, when I added -t html5 to the invocation the diagnostics above disappear, and the subscripts are indeed present in the converted PDF file. Thanks for the tip - it has indeed improved things. Now it is still the case that things like &mdash; or &Hscr; are ignored by pandoc. Any suggestions on how to get pandoc to process them? 
> 
> I am using the following:
> 
> pandoc 3.1.4
> Features: +server +lua
> Scripting engine: Lua 5.4
> 
> 
> 
> 
> 
> 
> On Tue, Jul 4, 2023 at 3:50 PM John MacFarlane <fiddlosopher@gmail.com> wrote:
> HTML tags should be passed through to HTML formats.
> 
> Have you looked at the intermediate HTML produced?  You can use --verbose to see it.
> 
> This seems to work fine:
> 
> % pandoc -t html5
> _A_<sub>_m_</sub>
> <p><em>A</em><sub><em>m</em></sub></p>
> 
> PS. You probably want to use -f gfm if you're targeting GitHub Markdown.
> 
> Pandoc version?
> 
> 
> 
> > On Jul 3,
> 
> > 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > 
> > I have a Github Markdown document that contains HTML tags - mostly to do with special characters (e.g. &Hscr;) and stuff to place pictures where I want in the page. The thing is, pandoc seems to ignore the HTML tags. Is this a limitation intrinsic to pandoc, or is there any way to get pandoc to process such tags and produce the right output? 
> > 
> > The pandoc invocation that I am currently using for converting my Github Markdown documents to PDF is
> > 
> >  $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css -o MyDoc.pdf
> > 
> > The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with images properly (in that it sometimes rearranges surrounding paragraphs the wrong way) and it seems to be unable to deal with expressions like _A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be ignored.
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb2op3Aq%3DP4L7xpNwPBBHtopKMx%2BurWz%2B-VQ%2B5Mh0CM%3DhQ%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/F4D52E47-33F8-4A2C-9A56-679BD5240ABD%40gmail.com.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF
       [not found]             ` <F4D52E47-33F8-4A2C-9A56-679BD5240ABD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2023-07-05 19:16               ` Luveh Keraph
       [not found]                 ` <CAFy1yb3hBrj7FUSM7wDiFY7hEB+GQ1PJSB4RiUo5YRNJnACZjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Luveh Keraph @ 2023-07-05 19:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 5524 bytes --]

Thanks. What I see for &Hscr; is that a non-printable character is
generated. When using --verbose --ascii -t html5 it appears as &#x210B; in
the resulting file, and just an empty space (as far as I can see) in the
PDF file.

On Wed, Jul 5, 2023 at 12:41 PM John MacFarlane <fiddlosopher@gmail.com>
wrote:

> &mdash; and &Hscr; will be parsed as unicode characters and these will be
> passed through to the HTML.
> You can check the intermediate HTML file (again it will be printed with
> --verbose) to confirm this.
> It may be that the program that is being invoked to go from HTML -> PDF
> (wkhtmltopdf ?) doesn't handle these characters properly.
> You could try adding the `--ascii` option which will force entities to be
> used.
>
> > On Jul 4, 2023, at 4:07 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> > Thanks. I invoked pandoc -f gfm MyDoc. -o MyDoc.pdf and in the resulting
> PDF document the subscripts are still ignored. When running it with
> --verbose in the resulting output I saw numerous instances of
> >
> > [INFO] Not rendering RawInline (Format "html") "</sub>"
> > [INFO] Not rendering RawInline (Format "html") "<sub>"
> >
> > However, when I added -t html5 to the invocation the diagnostics above
> disappear, and the subscripts are indeed present in the converted PDF file.
> Thanks for the tip - it has indeed improved things. Now it is still the
> case that things like &mdash; or &Hscr; are ignored by pandoc. Any
> suggestions on how to get pandoc to process them?
> >
> > I am using the following:
> >
> > pandoc 3.1.4
> > Features: +server +lua
> > Scripting engine: Lua 5.4
> >
> >
> >
> >
> >
> >
> > On Tue, Jul 4, 2023 at 3:50 PM John MacFarlane <fiddlosopher@gmail.com>
> wrote:
> > HTML tags should be passed through to HTML formats.
> >
> > Have you looked at the intermediate HTML produced?  You can use
> --verbose to see it.
> >
> > This seems to work fine:
> >
> > % pandoc -t html5
> > _A_<sub>_m_</sub>
> > <p><em>A</em><sub><em>m</em></sub></p>
> >
> > PS. You probably want to use -f gfm if you're targeting GitHub Markdown.
> >
> > Pandoc version?
> >
> >
> >
> > > On Jul 3,
> >
> > > 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > >
> > > I have a Github Markdown document that contains HTML tags - mostly to
> do with special characters (e.g. &Hscr;) and stuff to place pictures where
> I want in the page. The thing is, pandoc seems to ignore the HTML tags. Is
> this a limitation intrinsic to pandoc, or is there any way to get pandoc to
> process such tags and produce the right output?
> > >
> > > The pandoc invocation that I am currently using for converting my
> Github Markdown documents to PDF is
> > >
> > >  $ pandoc --resource-path=/home/abc/Repos.wiki -t html5
> --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css
> -o MyDoc.pdf
> > >
> > > The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing
> with images properly (in that it sometimes rearranges surrounding
> paragraphs the wrong way) and it seems to be unable to deal with
> expressions like _A_<sub>_m_</sub>, in that the <sub> and </sub> directives
> seem to be ignored.
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com
> .
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com
> .
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb2op3Aq%3DP4L7xpNwPBBHtopKMx%2BurWz%2B-VQ%2B5Mh0CM%3DhQ%40mail.gmail.com
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/F4D52E47-33F8-4A2C-9A56-679BD5240ABD%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb3hBrj7FUSM7wDiFY7hEB%2BGQ1PJSB4RiUo5YRNJnACZjA%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 7875 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF
       [not found]                 ` <CAFy1yb3hBrj7FUSM7wDiFY7hEB+GQ1PJSB4RiUo5YRNJnACZjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2023-07-05 19:47                   ` John MacFarlane
  0 siblings, 0 replies; 6+ messages in thread
From: John MacFarlane @ 2023-07-05 19:47 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 6872 bytes --]

That's a font issue.  You may need to specify a font with --pdf-engine-opts
or try another PDF engine that works with HTML

> On Jul 5, 2023, at 12:16 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> Thanks. What I see for &Hscr; is that a non-printable character is generated. When using --verbose --ascii -t html5 it appears as &#x210B; in the resulting file, and just an empty space (as far as I can see) in the PDF file.
> 
> On Wed, Jul 5, 2023 at 12:41 PM John MacFarlane <fiddlosopher@gmail.com <mailto:fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> wrote:
>> &mdash; and &Hscr; will be parsed as unicode characters and these will be passed through to the HTML.
>> You can check the intermediate HTML file (again it will be printed with --verbose) to confirm this.
>> It may be that the program that is being invoked to go from HTML -> PDF (wkhtmltopdf ?) doesn't handle these characters properly.
>> You could try adding the `--ascii` option which will force entities to be used.
>> 
>> > On Jul 4, 2023, at 4:07 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mailto:1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> wrote:
>> > 
>> > Thanks. I invoked pandoc -f gfm MyDoc. -o MyDoc.pdf and in the resulting PDF document the subscripts are still ignored. When running it with --verbose in the resulting output I saw numerous instances of 
>> > 
>> > [INFO] Not rendering RawInline (Format "html") "</sub>"
>> > [INFO] Not rendering RawInline (Format "html") "<sub>"
>> > 
>> > However, when I added -t html5 to the invocation the diagnostics above disappear, and the subscripts are indeed present in the converted PDF file. Thanks for the tip - it has indeed improved things. Now it is still the case that things like &mdash; or &Hscr; are ignored by pandoc. Any suggestions on how to get pandoc to process them? 
>> > 
>> > I am using the following:
>> > 
>> > pandoc 3.1.4
>> > Features: +server +lua
>> > Scripting engine: Lua 5.4
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > On Tue, Jul 4, 2023 at 3:50 PM John MacFarlane <fiddlosopher@gmail.com <mailto:fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> wrote:
>> > HTML tags should be passed through to HTML formats.
>> > 
>> > Have you looked at the intermediate HTML produced?  You can use --verbose to see it.
>> > 
>> > This seems to work fine:
>> > 
>> > % pandoc -t html5
>> > _A_<sub>_m_</sub>
>> > <p><em>A</em><sub><em>m</em></sub></p>
>> > 
>> > PS. You probably want to use -f gfm if you're targeting GitHub Markdown.
>> > 
>> > Pandoc version?
>> > 
>> > 
>> > 
>> > > On Jul 3,
>> > 
>> > > 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mailto:1.41421@gmail.com>> wrote:
>> > > 
>> > > I have a Github Markdown document that contains HTML tags - mostly to do with special characters (e.g. &Hscr;) and stuff to place pictures where I want in the page. The thing is, pandoc seems to ignore the HTML tags. Is this a limitation intrinsic to pandoc, or is there any way to get pandoc to process such tags and produce the right output? 
>> > > 
>> > > The pandoc invocation that I am currently using for converting my Github Markdown documents to PDF is
>> > > 
>> > >  $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css -o MyDoc.pdf
>> > > 
>> > > The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with images properly (in that it sometimes rearranges surrounding paragraphs the wrong way) and it seems to be unable to deal with expressions like _A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be ignored.
>> > > 
>> > > -- 
>> > > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> > > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
>> > > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com.
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
>> > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com.
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
>> > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb2op3Aq%3DP4L7xpNwPBBHtopKMx%2BurWz%2B-VQ%2B5Mh0CM%3DhQ%40mail.gmail.com.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/F4D52E47-33F8-4A2C-9A56-679BD5240ABD%40gmail.com.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb3hBrj7FUSM7wDiFY7hEB%2BGQ1PJSB4RiUo5YRNJnACZjA%40mail.gmail.com <https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb3hBrj7FUSM7wDiFY7hEB%2BGQ1PJSB4RiUo5YRNJnACZjA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2EAB1263-5AFF-41B5-A875-ABB40CACE349%40gmail.com.

[-- Attachment #2: Type: text/html, Size: 9086 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-07-05 19:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-03 22:41 Getting pandoc to convert Github Markdown documents with HTML tags to PDF Luveh Keraph
     [not found] ` <b1dae07b-11d1-4c98-8fcf-369f2b23a54cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-07-04 21:50   ` John MacFarlane
     [not found]     ` <529BC174-779A-4D98-BCC9-F59AEAAC2B9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2023-07-04 23:07       ` Luveh Keraph
     [not found]         ` <CAFy1yb2op3Aq=P4L7xpNwPBBHtopKMx+urWz+-VQ+5Mh0CM=hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-07-05 18:41           ` John MacFarlane
     [not found]             ` <F4D52E47-33F8-4A2C-9A56-679BD5240ABD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2023-07-05 19:16               ` Luveh Keraph
     [not found]                 ` <CAFy1yb3hBrj7FUSM7wDiFY7hEB+GQ1PJSB4RiUo5YRNJnACZjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-07-05 19:47                   ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).