public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: "'Michael Mell' via pandoc-discuss" <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Pandoc breaks table headers when converting HTML (exported from Confluence) to Github Flavored Markdown
Date: Thu, 13 Jul 2023 23:03:25 -0700 (PDT)	[thread overview]
Message-ID: <e4b6b290-ab59-4ff6-83ac-47b017e033f5n@googlegroups.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 6410 bytes --]

I am trying to convert HTML pages from our Confluence Wiki to Github 
Flavored Markdown for the Github Wiki.

I want to remove all formatting to get a "vanilla" Markdown output without 
embedded HTML. I settled on this command for the moment:

```sh
pandoc failing_table_tidy_reduced.html -f html-native_divs-native_spans -t 
gfm-raw_html -o failing_table_tidy_reduced.md
```

**(The contents of `failing_table_tidy_reduced.html` are pasted below.)**

The Markdown output is OK for the most part, except that the table headers 
are systematically broken. I get this for the example file that is pasted 
below:

```md
|                                                |                         
                      |                                                     
                                                                            
                        |
|------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| Step 1: Select to open image as virtual stack. | Step 2: Select image 
folder and open dataset. | Step 3: View with opened image stack. Use the 
slider of in the phase contrast histogram (top) to adjust image saturation 
for better channel visibility. |
| ![](attachments/314948158/314950704.png)       | 
![](attachments/314948158/314950710.png)      | 
![](attachments/314948158/314950785.png)                                   
                                                                            
 |
```

Whereas I expect the text (ie. "Step N: ...") to be in the table header, 
like so:

```md
| Step 1: Select to open image as virtual stack. | Step 2: Select image 
folder and open dataset. | Step 3: View with opened image stack. Use the 
slider of in the phase contrast histogram (top) to adjust image saturation 
for better channel visibility. |
|------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| ![](attachments/314948158/314950704.png)       | 
![](attachments/314948158/314950710.png)      | 
![](attachments/314948158/314950785.png)                                   
                                                                            
 |
```

What am I doing wrong?

---
This is the content of `failing_table_tidy_reduced.html`:

```html
<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.6.0">
<title>Title</title>
<link rel="stylesheet" href="styles/site.css" type="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type='text/css'>
/*<![CDATA[*/
div.rbtoc1689000519714 {padding: 0px;}
div.rbtoc1689000519714 ul {margin-left: 0px;}
div.rbtoc1689000519714 li {margin-left: 0px;padding-left: 0px;}

/*]]>*/
</style>
</head>
<body class="theme-default aui-theme-default">
<div class="table-wrap">
<table class="wrapped relative-table confluenceTable" style=
"width: 48.0112%;">
<colgroup>
<col style="width: 27.3364%;">
<col style="width: 28.271%;">
<col style="width: 44.3925%;"></colgroup>
<tbody>
<tr>
<th class="confluenceTh">
<p>Step 1: Select to open image as virtual stack.</p>
</th>
<th class="confluenceTh">
<p>Step 2: Select image folder and open dataset.</p>
</th>
<th class="confluenceTh">Step 3: View with opened image stack. Use
the slider of in the phase contrast histogram (top) to adjust image
saturation for better channel visibility.</th>
</tr>
<tr>
<td colspan="1" class="confluenceTd">
<div class="content-wrapper">
<p><span class=
"confluence-embedded-file-wrapper confluence-embedded-manual-size"><img 
class="confluence-embedded-image confluence-thumbnail"
draggable="false" height="250" src=
"attachments/314948158/314950704.png" data-image-src=
"attachments/314948158/314950704.png"
data-unresolved-comment-count="0" data-linked-resource-id=
"314950704" data-linked-resource-version="1"
data-linked-resource-type="attachment"
data-linked-resource-default-alias="image2022-4-26_15-0-46.png"
data-base-url="https://my.url.com"
data-linked-resource-content-type="image/png"
data-linked-resource-container-id="314948158"
data-linked-resource-container-version="61" alt=""></span></p>
</div>
</td>
<td colspan="1" class="confluenceTd">
<div class="content-wrapper">
<p><span class=
"confluence-embedded-file-wrapper confluence-embedded-manual-size"><img 
class="confluence-embedded-image confluence-thumbnail"
draggable="false" height="250" src=
"attachments/314948158/314950710.png" data-image-src=
"attachments/314948158/314950710.png"
data-unresolved-comment-count="0" data-linked-resource-id=
"314950710" data-linked-resource-version="1"
data-linked-resource-type="attachment"
data-linked-resource-default-alias="image2022-4-26_15-1-20.png"
data-base-url="https://my.url.com"
data-linked-resource-content-type="image/png"
data-linked-resource-container-id="314948158"
data-linked-resource-container-version="61" alt=""></span></p>
</div>
</td>
<td colspan="1" class="confluenceTd">
<div class="content-wrapper">
<p><span class=
"confluence-embedded-file-wrapper confluence-embedded-manual-size"><img 
class="confluence-embedded-image"
draggable="false" height="250" src=
"attachments/314948158/314950785.png" data-image-src=
"attachments/314948158/314950785.png"
data-unresolved-comment-count="0" data-linked-resource-id=
"314950785" data-linked-resource-version="1"
data-linked-resource-type="attachment"
data-linked-resource-default-alias="image2022-4-26_15-12-47.png"
data-base-url="https://my.url.com"
data-linked-resource-content-type="image/png"
data-linked-resource-container-id="314948158"
data-linked-resource-container-version="61" alt=""></span></p>
</div>
</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
```

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e4b6b290-ab59-4ff6-83ac-47b017e033f5n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 7964 bytes --]

             reply	other threads:[~2023-07-14  6:03 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-14  6:03 'Michael Mell' via pandoc-discuss [this message]
     [not found] ` <e4b6b290-ab59-4ff6-83ac-47b017e033f5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-07-14 18:39   ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4b6b290-ab59-4ff6-83ac-47b017e033f5n@googlegroups.com \
    --to=pandoc-discuss-/jypxa39uh5tlh3mbocffw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).