public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Pandoc breaks table headers when converting HTML (exported from Confluence) to Github Flavored Markdown
Date: Fri, 14 Jul 2023 11:39:44 -0700	[thread overview]
Message-ID: <617D7B7C-C5B6-43D3-9789-5014701BF8AC@gmail.com> (raw)
In-Reply-To: <e4b6b290-ab59-4ff6-83ac-47b017e033f5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 7648 bytes --]

I'm guessing the issue is that the heading for your table is inside the tbody element, rather than thead.

> On Jul 13, 2023, at 11:03 PM, 'Michael Mell' via pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:
> 
> I am trying to convert HTML pages from our Confluence Wiki to Github Flavored Markdown for the Github Wiki.
> 
> I want to remove all formatting to get a "vanilla" Markdown output without embedded HTML. I settled on this command for the moment:
> 
> ```sh
> pandoc failing_table_tidy_reduced.html -f html-native_divs-native_spans -t gfm-raw_html -o failing_table_tidy_reduced.md
> ```
> 
> **(The contents of `failing_table_tidy_reduced.html` are pasted below.)**
> 
> The Markdown output is OK for the most part, except that the table headers are systematically broken. I get this for the example file that is pasted below:
> 
> ```md
> |                                                |                                               |                                                                                                                                                         |
> |------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
> | Step 1: Select to open image as virtual stack. | Step 2: Select image folder and open dataset. | Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility. |
> | ![](attachments/314948158/314950704.png)       | ![](attachments/314948158/314950710.png)      | ![](attachments/314948158/314950785.png)                                                                                                                |
> ```
> 
> Whereas I expect the text (ie. "Step N: ...") to be in the table header, like so:
> 
> ```md
> | Step 1: Select to open image as virtual stack. | Step 2: Select image folder and open dataset. | Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility. |
> |------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
> | ![](attachments/314948158/314950704.png)       | ![](attachments/314948158/314950710.png)      | ![](attachments/314948158/314950785.png)                                                                                                                |
> ```
> 
> What am I doing wrong?
> 
> ---
> This is the content of `failing_table_tidy_reduced.html`:
> 
> ```html
> <!DOCTYPE html>
> <html>
> <head>
> <meta name="generator" content=
> "HTML Tidy for HTML5 for Linux version 5.6.0">
> <title>Title</title>
> <link rel="stylesheet" href="styles/site.css" type="text/css">
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
> <style type='text/css'>
> /*<![CDATA[*/
> div.rbtoc1689000519714 {padding: 0px;}
> div.rbtoc1689000519714 ul {margin-left: 0px;}
> div.rbtoc1689000519714 li {margin-left: 0px;padding-left: 0px;}
> 
> /*]]>*/
> </style>
> </head>
> <body class="theme-default aui-theme-default">
> <div class="table-wrap">
> <table class="wrapped relative-table confluenceTable" style=
> "width: 48.0112%;">
> <colgroup>
> <col style="width: 27.3364%;">
> <col style="width: 28.271%;">
> <col style="width: 44.3925%;"></colgroup>
> <tbody>
> <tr>
> <th class="confluenceTh">
> <p>Step 1: Select to open image as virtual stack.</p>
> </th>
> <th class="confluenceTh">
> <p>Step 2: Select image folder and open dataset.</p>
> </th>
> <th class="confluenceTh">Step 3: View with opened image stack. Use
> the slider of in the phase contrast histogram (top) to adjust image
> saturation for better channel visibility.</th>
> </tr>
> <tr>
> <td colspan="1" class="confluenceTd">
> <div class="content-wrapper">
> <p><span class=
> "confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image confluence-thumbnail"
> draggable="false" height="250" src=
> "attachments/314948158/314950704.png" data-image-src=
> "attachments/314948158/314950704.png"
> data-unresolved-comment-count="0" data-linked-resource-id=
> "314950704" data-linked-resource-version="1"
> data-linked-resource-type="attachment"
> data-linked-resource-default-alias="image2022-4-26_15-0-46.png"
> data-base-url="https://my.url.com"
> data-linked-resource-content-type="image/png"
> data-linked-resource-container-id="314948158"
> data-linked-resource-container-version="61" alt=""></span></p>
> </div>
> </td>
> <td colspan="1" class="confluenceTd">
> <div class="content-wrapper">
> <p><span class=
> "confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image confluence-thumbnail"
> draggable="false" height="250" src=
> "attachments/314948158/314950710.png" data-image-src=
> "attachments/314948158/314950710.png"
> data-unresolved-comment-count="0" data-linked-resource-id=
> "314950710" data-linked-resource-version="1"
> data-linked-resource-type="attachment"
> data-linked-resource-default-alias="image2022-4-26_15-1-20.png"
> data-base-url="https://my.url.com"
> data-linked-resource-content-type="image/png"
> data-linked-resource-container-id="314948158"
> data-linked-resource-container-version="61" alt=""></span></p>
> </div>
> </td>
> <td colspan="1" class="confluenceTd">
> <div class="content-wrapper">
> <p><span class=
> "confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image"
> draggable="false" height="250" src=
> "attachments/314948158/314950785.png" data-image-src=
> "attachments/314948158/314950785.png"
> data-unresolved-comment-count="0" data-linked-resource-id=
> "314950785" data-linked-resource-version="1"
> data-linked-resource-type="attachment"
> data-linked-resource-default-alias="image2022-4-26_15-12-47.png"
> data-base-url="https://my.url.com"
> data-linked-resource-content-type="image/png"
> data-linked-resource-container-id="314948158"
> data-linked-resource-container-version="61" alt=""></span></p>
> </div>
> </td>
> </tr>
> </tbody>
> </table>
> </div>
> </body>
> </html>
> ```
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e4b6b290-ab59-4ff6-83ac-47b017e033f5n%40googlegroups.com <https://groups.google.com/d/msgid/pandoc-discuss/e4b6b290-ab59-4ff6-83ac-47b017e033f5n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/617D7B7C-C5B6-43D3-9789-5014701BF8AC%40gmail.com.

[-- Attachment #2: Type: text/html, Size: 9957 bytes --]

      parent reply	other threads:[~2023-07-14 18:39 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-14  6:03 'Michael Mell' via pandoc-discuss
     [not found] ` <e4b6b290-ab59-4ff6-83ac-47b017e033f5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-07-14 18:39   ` John MacFarlane [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=617D7B7C-C5B6-43D3-9789-5014701BF8AC@gmail.com \
    --to=fiddlosopher-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).