* Pandoc breaks table headers when converting HTML (exported from Confluence) to Github Flavored Markdown @ 2023-07-14 6:03 'Michael Mell' via pandoc-discuss [not found] ` <e4b6b290-ab59-4ff6-83ac-47b017e033f5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 2+ messages in thread From: 'Michael Mell' via pandoc-discuss @ 2023-07-14 6:03 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 6410 bytes --] I am trying to convert HTML pages from our Confluence Wiki to Github Flavored Markdown for the Github Wiki. I want to remove all formatting to get a "vanilla" Markdown output without embedded HTML. I settled on this command for the moment: ```sh pandoc failing_table_tidy_reduced.html -f html-native_divs-native_spans -t gfm-raw_html -o failing_table_tidy_reduced.md ``` **(The contents of `failing_table_tidy_reduced.html` are pasted below.)** The Markdown output is OK for the most part, except that the table headers are systematically broken. I get this for the example file that is pasted below: ```md | | | | |------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| | Step 1: Select to open image as virtual stack. | Step 2: Select image folder and open dataset. | Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility. | | ![](attachments/314948158/314950704.png) | ![](attachments/314948158/314950710.png) | ![](attachments/314948158/314950785.png) | ``` Whereas I expect the text (ie. "Step N: ...") to be in the table header, like so: ```md | Step 1: Select to open image as virtual stack. | Step 2: Select image folder and open dataset. | Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility. | |------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| | ![](attachments/314948158/314950704.png) | ![](attachments/314948158/314950710.png) | ![](attachments/314948158/314950785.png) | ``` What am I doing wrong? --- This is the content of `failing_table_tidy_reduced.html`: ```html <!DOCTYPE html> <html> <head> <meta name="generator" content= "HTML Tidy for HTML5 for Linux version 5.6.0"> <title>Title</title> <link rel="stylesheet" href="styles/site.css" type="text/css"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <style type='text/css'> /*<![CDATA[*/ div.rbtoc1689000519714 {padding: 0px;} div.rbtoc1689000519714 ul {margin-left: 0px;} div.rbtoc1689000519714 li {margin-left: 0px;padding-left: 0px;} /*]]>*/ </style> </head> <body class="theme-default aui-theme-default"> <div class="table-wrap"> <table class="wrapped relative-table confluenceTable" style= "width: 48.0112%;"> <colgroup> <col style="width: 27.3364%;"> <col style="width: 28.271%;"> <col style="width: 44.3925%;"></colgroup> <tbody> <tr> <th class="confluenceTh"> <p>Step 1: Select to open image as virtual stack.</p> </th> <th class="confluenceTh"> <p>Step 2: Select image folder and open dataset.</p> </th> <th class="confluenceTh">Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility.</th> </tr> <tr> <td colspan="1" class="confluenceTd"> <div class="content-wrapper"> <p><span class= "confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image confluence-thumbnail" draggable="false" height="250" src= "attachments/314948158/314950704.png" data-image-src= "attachments/314948158/314950704.png" data-unresolved-comment-count="0" data-linked-resource-id= "314950704" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="image2022-4-26_15-0-46.png" data-base-url="https://my.url.com" data-linked-resource-content-type="image/png" data-linked-resource-container-id="314948158" data-linked-resource-container-version="61" alt=""></span></p> </div> </td> <td colspan="1" class="confluenceTd"> <div class="content-wrapper"> <p><span class= "confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image confluence-thumbnail" draggable="false" height="250" src= "attachments/314948158/314950710.png" data-image-src= "attachments/314948158/314950710.png" data-unresolved-comment-count="0" data-linked-resource-id= "314950710" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="image2022-4-26_15-1-20.png" data-base-url="https://my.url.com" data-linked-resource-content-type="image/png" data-linked-resource-container-id="314948158" data-linked-resource-container-version="61" alt=""></span></p> </div> </td> <td colspan="1" class="confluenceTd"> <div class="content-wrapper"> <p><span class= "confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" draggable="false" height="250" src= "attachments/314948158/314950785.png" data-image-src= "attachments/314948158/314950785.png" data-unresolved-comment-count="0" data-linked-resource-id= "314950785" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="image2022-4-26_15-12-47.png" data-base-url="https://my.url.com" data-linked-resource-content-type="image/png" data-linked-resource-container-id="314948158" data-linked-resource-container-version="61" alt=""></span></p> </div> </td> </tr> </tbody> </table> </div> </body> </html> ``` -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e4b6b290-ab59-4ff6-83ac-47b017e033f5n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 7964 bytes --] ^ permalink raw reply [flat|nested] 2+ messages in thread
[parent not found: <e4b6b290-ab59-4ff6-83ac-47b017e033f5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Pandoc breaks table headers when converting HTML (exported from Confluence) to Github Flavored Markdown [not found] ` <e4b6b290-ab59-4ff6-83ac-47b017e033f5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2023-07-14 18:39 ` John MacFarlane 0 siblings, 0 replies; 2+ messages in thread From: John MacFarlane @ 2023-07-14 18:39 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 7648 bytes --] I'm guessing the issue is that the heading for your table is inside the tbody element, rather than thead. > On Jul 13, 2023, at 11:03 PM, 'Michael Mell' via pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: > > I am trying to convert HTML pages from our Confluence Wiki to Github Flavored Markdown for the Github Wiki. > > I want to remove all formatting to get a "vanilla" Markdown output without embedded HTML. I settled on this command for the moment: > > ```sh > pandoc failing_table_tidy_reduced.html -f html-native_divs-native_spans -t gfm-raw_html -o failing_table_tidy_reduced.md > ``` > > **(The contents of `failing_table_tidy_reduced.html` are pasted below.)** > > The Markdown output is OK for the most part, except that the table headers are systematically broken. I get this for the example file that is pasted below: > > ```md > | | | | > |------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| > | Step 1: Select to open image as virtual stack. | Step 2: Select image folder and open dataset. | Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility. | > | ![](attachments/314948158/314950704.png) | ![](attachments/314948158/314950710.png) | ![](attachments/314948158/314950785.png) | > ``` > > Whereas I expect the text (ie. "Step N: ...") to be in the table header, like so: > > ```md > | Step 1: Select to open image as virtual stack. | Step 2: Select image folder and open dataset. | Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility. | > |------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| > | ![](attachments/314948158/314950704.png) | ![](attachments/314948158/314950710.png) | ![](attachments/314948158/314950785.png) | > ``` > > What am I doing wrong? > > --- > This is the content of `failing_table_tidy_reduced.html`: > > ```html > <!DOCTYPE html> > <html> > <head> > <meta name="generator" content= > "HTML Tidy for HTML5 for Linux version 5.6.0"> > <title>Title</title> > <link rel="stylesheet" href="styles/site.css" type="text/css"> > <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> > <style type='text/css'> > /*<![CDATA[*/ > div.rbtoc1689000519714 {padding: 0px;} > div.rbtoc1689000519714 ul {margin-left: 0px;} > div.rbtoc1689000519714 li {margin-left: 0px;padding-left: 0px;} > > /*]]>*/ > </style> > </head> > <body class="theme-default aui-theme-default"> > <div class="table-wrap"> > <table class="wrapped relative-table confluenceTable" style= > "width: 48.0112%;"> > <colgroup> > <col style="width: 27.3364%;"> > <col style="width: 28.271%;"> > <col style="width: 44.3925%;"></colgroup> > <tbody> > <tr> > <th class="confluenceTh"> > <p>Step 1: Select to open image as virtual stack.</p> > </th> > <th class="confluenceTh"> > <p>Step 2: Select image folder and open dataset.</p> > </th> > <th class="confluenceTh">Step 3: View with opened image stack. Use > the slider of in the phase contrast histogram (top) to adjust image > saturation for better channel visibility.</th> > </tr> > <tr> > <td colspan="1" class="confluenceTd"> > <div class="content-wrapper"> > <p><span class= > "confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image confluence-thumbnail" > draggable="false" height="250" src= > "attachments/314948158/314950704.png" data-image-src= > "attachments/314948158/314950704.png" > data-unresolved-comment-count="0" data-linked-resource-id= > "314950704" data-linked-resource-version="1" > data-linked-resource-type="attachment" > data-linked-resource-default-alias="image2022-4-26_15-0-46.png" > data-base-url="https://my.url.com" > data-linked-resource-content-type="image/png" > data-linked-resource-container-id="314948158" > data-linked-resource-container-version="61" alt=""></span></p> > </div> > </td> > <td colspan="1" class="confluenceTd"> > <div class="content-wrapper"> > <p><span class= > "confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image confluence-thumbnail" > draggable="false" height="250" src= > "attachments/314948158/314950710.png" data-image-src= > "attachments/314948158/314950710.png" > data-unresolved-comment-count="0" data-linked-resource-id= > "314950710" data-linked-resource-version="1" > data-linked-resource-type="attachment" > data-linked-resource-default-alias="image2022-4-26_15-1-20.png" > data-base-url="https://my.url.com" > data-linked-resource-content-type="image/png" > data-linked-resource-container-id="314948158" > data-linked-resource-container-version="61" alt=""></span></p> > </div> > </td> > <td colspan="1" class="confluenceTd"> > <div class="content-wrapper"> > <p><span class= > "confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" > draggable="false" height="250" src= > "attachments/314948158/314950785.png" data-image-src= > "attachments/314948158/314950785.png" > data-unresolved-comment-count="0" data-linked-resource-id= > "314950785" data-linked-resource-version="1" > data-linked-resource-type="attachment" > data-linked-resource-default-alias="image2022-4-26_15-12-47.png" > data-base-url="https://my.url.com" > data-linked-resource-content-type="image/png" > data-linked-resource-container-id="314948158" > data-linked-resource-container-version="61" alt=""></span></p> > </div> > </td> > </tr> > </tbody> > </table> > </div> > </body> > </html> > ``` > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e4b6b290-ab59-4ff6-83ac-47b017e033f5n%40googlegroups.com <https://groups.google.com/d/msgid/pandoc-discuss/e4b6b290-ab59-4ff6-83ac-47b017e033f5n%40googlegroups.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/617D7B7C-C5B6-43D3-9789-5014701BF8AC%40gmail.com. [-- Attachment #2: Type: text/html, Size: 9957 bytes --] ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-07-14 18:39 UTC | newest] Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-07-14 6:03 Pandoc breaks table headers when converting HTML (exported from Confluence) to Github Flavored Markdown 'Michael Mell' via pandoc-discuss [not found] ` <e4b6b290-ab59-4ff6-83ac-47b017e033f5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2023-07-14 18:39 ` John MacFarlane
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).