* Getting pandoc to convert Github Markdown documents with HTML tags to PDF @ 2023-07-03 22:41 Luveh Keraph [not found] ` <b1dae07b-11d1-4c98-8fcf-369f2b23a54cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Luveh Keraph @ 2023-07-03 22:41 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1309 bytes --] I have a Github Markdown document that contains HTML tags - mostly to do with special characters (e.g. ℋ) and stuff to place pictures where I want in the page. The thing is, pandoc seems to ignore the HTML tags. Is this a limitation intrinsic to pandoc, or is there any way to get pandoc to process such tags and produce the right output? The pandoc invocation that I am currently using for converting my Github Markdown documents to PDF is $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css -o MyDoc.pdf The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with images properly (in that it sometimes rearranges surrounding paragraphs the wrong way) and it seems to be unable to deal with expressions like _A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be ignored. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 1708 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <b1dae07b-11d1-4c98-8fcf-369f2b23a54cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF [not found] ` <b1dae07b-11d1-4c98-8fcf-369f2b23a54cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2023-07-04 21:50 ` John MacFarlane [not found] ` <529BC174-779A-4D98-BCC9-F59AEAAC2B9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: John MacFarlane @ 2023-07-04 21:50 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw HTML tags should be passed through to HTML formats. Have you looked at the intermediate HTML produced? You can use --verbose to see it. This seems to work fine: % pandoc -t html5 _A_<sub>_m_</sub> <p><em>A</em><sub><em>m</em></sub></p> PS. You probably want to use -f gfm if you're targeting GitHub Markdown. Pandoc version? > On Jul 3, > 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > I have a Github Markdown document that contains HTML tags - mostly to do with special characters (e.g. ℋ) and stuff to place pictures where I want in the page. The thing is, pandoc seems to ignore the HTML tags. Is this a limitation intrinsic to pandoc, or is there any way to get pandoc to process such tags and produce the right output? > > The pandoc invocation that I am currently using for converting my Github Markdown documents to PDF is > > $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css -o MyDoc.pdf > > The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with images properly (in that it sometimes rearranges surrounding paragraphs the wrong way) and it seems to be unable to deal with expressions like _A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be ignored. > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com. ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <529BC174-779A-4D98-BCC9-F59AEAAC2B9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF [not found] ` <529BC174-779A-4D98-BCC9-F59AEAAC2B9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2023-07-04 23:07 ` Luveh Keraph [not found] ` <CAFy1yb2op3Aq=P4L7xpNwPBBHtopKMx+urWz+-VQ+5Mh0CM=hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Luveh Keraph @ 2023-07-04 23:07 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 3618 bytes --] Thanks. I invoked pandoc -f gfm MyDoc. -o MyDoc.pdf and in the resulting PDF document the subscripts are still ignored. When running it with --verbose in the resulting output I saw numerous instances of [INFO] Not rendering RawInline (Format "html") "</sub>" [INFO] Not rendering RawInline (Format "html") "<sub>" However, when I added -t html5 to the invocation the diagnostics above disappear, and the subscripts are indeed present in the converted PDF file. Thanks for the tip - it has indeed improved things. Now it is still the case that things like — or ℋ are ignored by pandoc. Any suggestions on how to get pandoc to process them? I am using the following: pandoc 3.1.4 Features: +server +lua Scripting engine: Lua 5.4 On Tue, Jul 4, 2023 at 3:50 PM John MacFarlane <fiddlosopher@gmail.com> wrote: > HTML tags should be passed through to HTML formats. > > Have you looked at the intermediate HTML produced? You can use --verbose > to see it. > > This seems to work fine: > > % pandoc -t html5 > _A_<sub>_m_</sub> > <p><em>A</em><sub><em>m</em></sub></p> > > PS. You probably want to use -f gfm if you're targeting GitHub Markdown. > > Pandoc version? > > > > > On Jul 3, > > > 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > I have a Github Markdown document that contains HTML tags - mostly to do > with special characters (e.g. ℋ) and stuff to place pictures where I > want in the page. The thing is, pandoc seems to ignore the HTML tags. Is > this a limitation intrinsic to pandoc, or is there any way to get pandoc to > process such tags and produce the right output? > > > > The pandoc invocation that I am currently using for converting my Github > Markdown documents to PDF is > > > > $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 > --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css > -o MyDoc.pdf > > > > The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with > images properly (in that it sometimes rearranges surrounding paragraphs the > wrong way) and it seems to be unable to deal with expressions like > _A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be > ignored. > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com > . > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb2op3Aq%3DP4L7xpNwPBBHtopKMx%2BurWz%2B-VQ%2B5Mh0CM%3DhQ%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 5262 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CAFy1yb2op3Aq=P4L7xpNwPBBHtopKMx+urWz+-VQ+5Mh0CM=hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF [not found] ` <CAFy1yb2op3Aq=P4L7xpNwPBBHtopKMx+urWz+-VQ+5Mh0CM=hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2023-07-05 18:41 ` John MacFarlane [not found] ` <F4D52E47-33F8-4A2C-9A56-679BD5240ABD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: John MacFarlane @ 2023-07-05 18:41 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw — and ℋ will be parsed as unicode characters and these will be passed through to the HTML. You can check the intermediate HTML file (again it will be printed with --verbose) to confirm this. It may be that the program that is being invoked to go from HTML -> PDF (wkhtmltopdf ?) doesn't handle these characters properly. You could try adding the `--ascii` option which will force entities to be used. > On Jul 4, 2023, at 4:07 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > Thanks. I invoked pandoc -f gfm MyDoc. -o MyDoc.pdf and in the resulting PDF document the subscripts are still ignored. When running it with --verbose in the resulting output I saw numerous instances of > > [INFO] Not rendering RawInline (Format "html") "</sub>" > [INFO] Not rendering RawInline (Format "html") "<sub>" > > However, when I added -t html5 to the invocation the diagnostics above disappear, and the subscripts are indeed present in the converted PDF file. Thanks for the tip - it has indeed improved things. Now it is still the case that things like — or ℋ are ignored by pandoc. Any suggestions on how to get pandoc to process them? > > I am using the following: > > pandoc 3.1.4 > Features: +server +lua > Scripting engine: Lua 5.4 > > > > > > > On Tue, Jul 4, 2023 at 3:50 PM John MacFarlane <fiddlosopher@gmail.com> wrote: > HTML tags should be passed through to HTML formats. > > Have you looked at the intermediate HTML produced? You can use --verbose to see it. > > This seems to work fine: > > % pandoc -t html5 > _A_<sub>_m_</sub> > <p><em>A</em><sub><em>m</em></sub></p> > > PS. You probably want to use -f gfm if you're targeting GitHub Markdown. > > Pandoc version? > > > > > On Jul 3, > > > 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > I have a Github Markdown document that contains HTML tags - mostly to do with special characters (e.g. ℋ) and stuff to place pictures where I want in the page. The thing is, pandoc seems to ignore the HTML tags. Is this a limitation intrinsic to pandoc, or is there any way to get pandoc to process such tags and produce the right output? > > > > The pandoc invocation that I am currently using for converting my Github Markdown documents to PDF is > > > > $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css -o MyDoc.pdf > > > > The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with images properly (in that it sometimes rearranges surrounding paragraphs the wrong way) and it seems to be unable to deal with expressions like _A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be ignored. > > > > -- > > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com. > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com. > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb2op3Aq%3DP4L7xpNwPBBHtopKMx%2BurWz%2B-VQ%2B5Mh0CM%3DhQ%40mail.gmail.com. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/F4D52E47-33F8-4A2C-9A56-679BD5240ABD%40gmail.com. ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <F4D52E47-33F8-4A2C-9A56-679BD5240ABD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF [not found] ` <F4D52E47-33F8-4A2C-9A56-679BD5240ABD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2023-07-05 19:16 ` Luveh Keraph [not found] ` <CAFy1yb3hBrj7FUSM7wDiFY7hEB+GQ1PJSB4RiUo5YRNJnACZjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Luveh Keraph @ 2023-07-05 19:16 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 5524 bytes --] Thanks. What I see for ℋ is that a non-printable character is generated. When using --verbose --ascii -t html5 it appears as ℋ in the resulting file, and just an empty space (as far as I can see) in the PDF file. On Wed, Jul 5, 2023 at 12:41 PM John MacFarlane <fiddlosopher@gmail.com> wrote: > — and ℋ will be parsed as unicode characters and these will be > passed through to the HTML. > You can check the intermediate HTML file (again it will be printed with > --verbose) to confirm this. > It may be that the program that is being invoked to go from HTML -> PDF > (wkhtmltopdf ?) doesn't handle these characters properly. > You could try adding the `--ascii` option which will force entities to be > used. > > > On Jul 4, 2023, at 4:07 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > Thanks. I invoked pandoc -f gfm MyDoc. -o MyDoc.pdf and in the resulting > PDF document the subscripts are still ignored. When running it with > --verbose in the resulting output I saw numerous instances of > > > > [INFO] Not rendering RawInline (Format "html") "</sub>" > > [INFO] Not rendering RawInline (Format "html") "<sub>" > > > > However, when I added -t html5 to the invocation the diagnostics above > disappear, and the subscripts are indeed present in the converted PDF file. > Thanks for the tip - it has indeed improved things. Now it is still the > case that things like — or ℋ are ignored by pandoc. Any > suggestions on how to get pandoc to process them? > > > > I am using the following: > > > > pandoc 3.1.4 > > Features: +server +lua > > Scripting engine: Lua 5.4 > > > > > > > > > > > > > > On Tue, Jul 4, 2023 at 3:50 PM John MacFarlane <fiddlosopher@gmail.com> > wrote: > > HTML tags should be passed through to HTML formats. > > > > Have you looked at the intermediate HTML produced? You can use > --verbose to see it. > > > > This seems to work fine: > > > > % pandoc -t html5 > > _A_<sub>_m_</sub> > > <p><em>A</em><sub><em>m</em></sub></p> > > > > PS. You probably want to use -f gfm if you're targeting GitHub Markdown. > > > > Pandoc version? > > > > > > > > > On Jul 3, > > > > > 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > > I have a Github Markdown document that contains HTML tags - mostly to > do with special characters (e.g. ℋ) and stuff to place pictures where > I want in the page. The thing is, pandoc seems to ignore the HTML tags. Is > this a limitation intrinsic to pandoc, or is there any way to get pandoc to > process such tags and produce the right output? > > > > > > The pandoc invocation that I am currently using for converting my > Github Markdown documents to PDF is > > > > > > $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 > --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css > -o MyDoc.pdf > > > > > > The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing > with images properly (in that it sometimes rearranges surrounding > paragraphs the wrong way) and it seems to be unable to deal with > expressions like _A_<sub>_m_</sub>, in that the <sub> and </sub> directives > seem to be ignored. > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com > . > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com > . > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb2op3Aq%3DP4L7xpNwPBBHtopKMx%2BurWz%2B-VQ%2B5Mh0CM%3DhQ%40mail.gmail.com > . > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/F4D52E47-33F8-4A2C-9A56-679BD5240ABD%40gmail.com > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb3hBrj7FUSM7wDiFY7hEB%2BGQ1PJSB4RiUo5YRNJnACZjA%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 7875 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CAFy1yb3hBrj7FUSM7wDiFY7hEB+GQ1PJSB4RiUo5YRNJnACZjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Getting pandoc to convert Github Markdown documents with HTML tags to PDF [not found] ` <CAFy1yb3hBrj7FUSM7wDiFY7hEB+GQ1PJSB4RiUo5YRNJnACZjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2023-07-05 19:47 ` John MacFarlane 0 siblings, 0 replies; 6+ messages in thread From: John MacFarlane @ 2023-07-05 19:47 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 6872 bytes --] That's a font issue. You may need to specify a font with --pdf-engine-opts or try another PDF engine that works with HTML > On Jul 5, 2023, at 12:16 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > Thanks. What I see for ℋ is that a non-printable character is generated. When using --verbose --ascii -t html5 it appears as ℋ in the resulting file, and just an empty space (as far as I can see) in the PDF file. > > On Wed, Jul 5, 2023 at 12:41 PM John MacFarlane <fiddlosopher@gmail.com <mailto:fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> wrote: >> — and ℋ will be parsed as unicode characters and these will be passed through to the HTML. >> You can check the intermediate HTML file (again it will be printed with --verbose) to confirm this. >> It may be that the program that is being invoked to go from HTML -> PDF (wkhtmltopdf ?) doesn't handle these characters properly. >> You could try adding the `--ascii` option which will force entities to be used. >> >> > On Jul 4, 2023, at 4:07 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mailto:1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> wrote: >> > >> > Thanks. I invoked pandoc -f gfm MyDoc. -o MyDoc.pdf and in the resulting PDF document the subscripts are still ignored. When running it with --verbose in the resulting output I saw numerous instances of >> > >> > [INFO] Not rendering RawInline (Format "html") "</sub>" >> > [INFO] Not rendering RawInline (Format "html") "<sub>" >> > >> > However, when I added -t html5 to the invocation the diagnostics above disappear, and the subscripts are indeed present in the converted PDF file. Thanks for the tip - it has indeed improved things. Now it is still the case that things like — or ℋ are ignored by pandoc. Any suggestions on how to get pandoc to process them? >> > >> > I am using the following: >> > >> > pandoc 3.1.4 >> > Features: +server +lua >> > Scripting engine: Lua 5.4 >> > >> > >> > >> > >> > >> > >> > On Tue, Jul 4, 2023 at 3:50 PM John MacFarlane <fiddlosopher@gmail.com <mailto:fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> wrote: >> > HTML tags should be passed through to HTML formats. >> > >> > Have you looked at the intermediate HTML produced? You can use --verbose to see it. >> > >> > This seems to work fine: >> > >> > % pandoc -t html5 >> > _A_<sub>_m_</sub> >> > <p><em>A</em><sub><em>m</em></sub></p> >> > >> > PS. You probably want to use -f gfm if you're targeting GitHub Markdown. >> > >> > Pandoc version? >> > >> > >> > >> > > On Jul 3, >> > >> > > 2023, at 3:41 PM, Luveh Keraph <1.41421-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mailto:1.41421@gmail.com>> wrote: >> > > >> > > I have a Github Markdown document that contains HTML tags - mostly to do with special characters (e.g. ℋ) and stuff to place pictures where I want in the page. The thing is, pandoc seems to ignore the HTML tags. Is this a limitation intrinsic to pandoc, or is there any way to get pandoc to process such tags and produce the right output? >> > > >> > > The pandoc invocation that I am currently using for converting my Github Markdown documents to PDF is >> > > >> > > $ pandoc --resource-path=/home/abc/Repos.wiki -t html5 --pdf-engine=wkhtmltopdf --metadata pagetitle="MyDoc.md" --css github.css -o MyDoc.pdf >> > > >> > > The default invocation pandoc MyDoc.md -o MyDoc.pdf is not dealing with images properly (in that it sometimes rearranges surrounding paragraphs the wrong way) and it seems to be unable to deal with expressions like _A_<sub>_m_</sub>, in that the <sub> and </sub> directives seem to be ignored. >> > > >> > > -- >> > > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. >> > > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. >> > > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b1dae07b-11d1-4c98-8fcf-369f2b23a54cn%40googlegroups.com. >> > >> > -- >> > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. >> > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. >> > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/529BC174-779A-4D98-BCC9-F59AEAAC2B9D%40gmail.com. >> > >> > -- >> > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. >> > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. >> > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb2op3Aq%3DP4L7xpNwPBBHtopKMx%2BurWz%2B-VQ%2B5Mh0CM%3DhQ%40mail.gmail.com. >> >> -- >> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. >> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/F4D52E47-33F8-4A2C-9A56-679BD5240ABD%40gmail.com. > > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb3hBrj7FUSM7wDiFY7hEB%2BGQ1PJSB4RiUo5YRNJnACZjA%40mail.gmail.com <https://groups.google.com/d/msgid/pandoc-discuss/CAFy1yb3hBrj7FUSM7wDiFY7hEB%2BGQ1PJSB4RiUo5YRNJnACZjA%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2EAB1263-5AFF-41B5-A875-ABB40CACE349%40gmail.com. [-- Attachment #2: Type: text/html, Size: 9086 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-07-05 19:47 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-07-03 22:41 Getting pandoc to convert Github Markdown documents with HTML tags to PDF Luveh Keraph [not found] ` <b1dae07b-11d1-4c98-8fcf-369f2b23a54cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2023-07-04 21:50 ` John MacFarlane [not found] ` <529BC174-779A-4D98-BCC9-F59AEAAC2B9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2023-07-04 23:07 ` Luveh Keraph [not found] ` <CAFy1yb2op3Aq=P4L7xpNwPBBHtopKMx+urWz+-VQ+5Mh0CM=hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2023-07-05 18:41 ` John MacFarlane [not found] ` <F4D52E47-33F8-4A2C-9A56-679BD5240ABD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2023-07-05 19:16 ` Luveh Keraph [not found] ` <CAFy1yb3hBrj7FUSM7wDiFY7hEB+GQ1PJSB4RiUo5YRNJnACZjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2023-07-05 19:47 ` John MacFarlane
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).