From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32950 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "'Michael Mell' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Pandoc breaks table headers when converting HTML (exported from Confluence) to Github Flavored Markdown Date: Thu, 13 Jul 2023 23:03:25 -0700 (PDT) Message-ID: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_452_89323412.1689314605261" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4772"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDWJDGHFSIIRBLWKYOSQMGQEZLRX5GQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Jul 14 08:03:31 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oa1-f61.google.com ([209.85.160.61]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1qKBtq-00011D-0b for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 14 Jul 2023 08:03:30 +0200 Original-Received: by mail-oa1-f61.google.com with SMTP id 586e51a60fabf-1b773df6216sf2464757fac.1 for ; Thu, 13 Jul 2023 23:03:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1689314608; x=1691906608; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:message-id:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=bZDypuH80TDmZKpiRb9KmNzSv5ItiyeRKsBZhcvNQJU=; b=GaQE+H/oNA2Hm5RYEiL+RyZBfmnXjYaO4MC1JKP3lwq5yo/2Sh0BtAKjRT6/mCHx7i HWg3qeCbEYozn878Z4vKLw7kYu2SM9ujvdLEAVolY3TJpN9KXA7BFtTtJOAozAkEdHnR mI3YrZH4EHwDLUjyNQ8i/NrI3KFmle4RLksvjRabILhlspfr/28agUMvChS9Zbc+YaBc 9tNKxmlJ4lxSgHdfIclqamCoj6B35SfNWu4n5pyx5g0lGFEH+gPubwD9dgBqdcKkE5Dt pEoiH7ggswnqmLTyn/c626gMcpekdy2T1We11ddcho0HEYJrUPOtkp55lRmJawgqIEkN bK1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689314608; x=1691906608; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:message-id:to:from:date :x-beenthere:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bZDypuH80TDmZKpiRb9KmNzSv5ItiyeRKsBZhcvNQJU=; b=YvdBiCsBuJxsiRw5LBW0f1wmd9N7XEr/TQ5O+fDrph7qfgi9uP66Z9axChekm0ixYH Zzss7IJUH9dN3Q//rji8inhQpsKjFK0k8ZtF4F9+VDbll79XfGLMgxsVcD6NAITKGNsJ rKCYRHQ14yWr/qfN+rGBhXmDKxA+oo2LJFEHvLDeXcd0920JL9XgehXV6BmYuSWMJJ8v 4i9Js/zDnBnbVL438ZWZryOwPqj0H2VBmE5LzKuhnlpsl0CmdIdKPWyEi9hKFaUFd4pI swEC+PYE6UOJNqLgXwWhF02eVZ5rEk+sdtPdnEYLtF/mmoRyT8CuxzY7DIeMlG0RuFc0 uE X-Gm-Message-State: ABy/qLZFWmRwUqRLHRhVnVCfPcy3weo23LBwvws9E31f8tK07wGeohL7 HwccSBmn9jJ+0Oqyg5pTyqo= X-Google-Smtp-Source: APBJJlE43ex1qbNvwz5+QI6CtIDAVNpJG1trwe3PQ8rQWpVV9jcvFHkBgFRrOUdxSNnwG5TvmcKiyw== X-Received: by 2002:a05:6870:d14d:b0:1b3:e46a:7164 with SMTP id f13-20020a056870d14d00b001b3e46a7164mr5104146oac.40.1689314608763; Thu, 13 Jul 2023 23:03:28 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6870:14c9:b0:1ac:f73a:23cc with SMTP id l9-20020a05687014c900b001acf73a23ccls1765602oab.0.-pod-prod-06-us; Thu, 13 Jul 2023 23:03:26 -0700 (PDT) X-Received: by 2002:a05:6870:772f:b0:1b0:60ff:b750 with SMTP id dw47-20020a056870772f00b001b060ffb750mr3686893oab.3.1689314605934; Thu, 13 Jul 2023 23:03:25 -0700 (PDT) X-Original-Sender: michael.mell2-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org X-Original-From: Michael Mell Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32950 Archived-At: ------=_Part_452_89323412.1689314605261 Content-Type: multipart/alternative; boundary="----=_Part_453_1407655633.1689314605261" ------=_Part_453_1407655633.1689314605261 Content-Type: text/plain; charset="UTF-8" I am trying to convert HTML pages from our Confluence Wiki to Github Flavored Markdown for the Github Wiki. I want to remove all formatting to get a "vanilla" Markdown output without embedded HTML. I settled on this command for the moment: ```sh pandoc failing_table_tidy_reduced.html -f html-native_divs-native_spans -t gfm-raw_html -o failing_table_tidy_reduced.md ``` **(The contents of `failing_table_tidy_reduced.html` are pasted below.)** The Markdown output is OK for the most part, except that the table headers are systematically broken. I get this for the example file that is pasted below: ```md | | | | |------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| | Step 1: Select to open image as virtual stack. | Step 2: Select image folder and open dataset. | Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility. | | ![](attachments/314948158/314950704.png) | ![](attachments/314948158/314950710.png) | ![](attachments/314948158/314950785.png) | ``` Whereas I expect the text (ie. "Step N: ...") to be in the table header, like so: ```md | Step 1: Select to open image as virtual stack. | Step 2: Select image folder and open dataset. | Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility. | |------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| | ![](attachments/314948158/314950704.png) | ![](attachments/314948158/314950710.png) | ![](attachments/314948158/314950785.png) | ``` What am I doing wrong? --- This is the content of `failing_table_tidy_reduced.html`: ```html Title

Step 1: Select to open image as virtual stack.

Step 2: Select image folder and open dataset.

Step 3: View with opened image stack. Use the slider of in the phase contrast histogram (top) to adjust image saturation for better channel visibility.

``` -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e4b6b290-ab59-4ff6-83ac-47b017e033f5n%40googlegroups.com. ------=_Part_453_1407655633.1689314605261 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I am trying to convert HTML pages from our Confluence Wiki to Github Flavor= ed Markdown for the Github Wiki.

I want to remove all formatting= to get a "vanilla" Markdown output without embedded HTML. I settled on thi= s command for the moment:

```sh
pandoc failing_table_tidy_r= educed.html -f html-native_divs-native_spans -t gfm-raw_html -o failing_tab= le_tidy_reduced.md
```

**(The contents of `failing_table_ti= dy_reduced.html` are pasted below.)**

The Markdown output is OK = for the most part, except that the table headers are systematically broken.= I get this for the example file that is pasted below:

```md
| =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |
|-----------= -------------------------------------|-------------------------------------= ----------|----------------------------------------------------------------= ---------------------------------------------------------------------------= --------------|
| Step 1: Select to open image as virtual stack. | Ste= p 2: Select image folder and open dataset. | Step 3: View with opened image= stack. Use the slider of in the phase contrast histogram (top) to adjust i= mage saturation for better channel visibility. |
| ![](attachments/314= 948158/314950704.png) =C2=A0 =C2=A0 =C2=A0 | ![](attachments/314948158/3149= 50710.png) =C2=A0 =C2=A0 =C2=A0| ![](attachments/314948158/314950785.png) = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0|
```

Whereas I expect the text (ie= . "Step N: ...") to be in the table header, like so:

```md
= | Step 1: Select to open image as virtual stack. | Step 2: Select image fol= der and open dataset. | Step 3: View with opened image stack. Use the slide= r of in the phase contrast histogram (top) to adjust image saturation for b= etter channel visibility. |
|-----------------------------------------= -------|-----------------------------------------------|-------------------= ---------------------------------------------------------------------------= -----------------------------------------------------------|
| ![](att= achments/314948158/314950704.png) =C2=A0 =C2=A0 =C2=A0 | ![](attachments/31= 4948158/314950710.png) =C2=A0 =C2=A0 =C2=A0| ![](attachments/314948158/3149= 50785.png) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|
```

What am I doing wro= ng?

---
This is the content of `failing_table_tidy_reduced.= html`:

```html
<!DOCTYPE html>
<html>
<head>
<meta name=3D"generator" content=3D
"HTML Tidy f= or HTML5 for Linux version 5.6.0">
<title>Title</title>=
<link rel=3D"stylesheet" href=3D"styles/site.css" type=3D"text/css= ">
<meta http-equiv=3D"Content-Type" content=3D"text/html; chars= et=3Dutf-8">
<style type=3D'text/css'>
/*<![CDATA[*/<= br />div.rbtoc1689000519714 {padding: 0px;}
div.rbtoc1689000519714 ul = {margin-left: 0px;}
div.rbtoc1689000519714 li {margin-left: 0px;paddin= g-left: 0px;}

/*]]>*/
</style>
</head><= br /><body class=3D"theme-default aui-theme-default">
<div cl= ass=3D"table-wrap">
<table class=3D"wrapped relative-table confl= uenceTable" style=3D
"width: 48.0112%;">
<colgroup>
<col style=3D"width: 27.3364%;">
<col style=3D"width: 28.271= %;">
<col style=3D"width: 44.3925%;"></colgroup>
&= lt;tbody>
<tr>
<th class=3D"confluenceTh">
&l= t;p>Step 1: Select to open image as virtual stack.</p>
</t= h>
<th class=3D"confluenceTh">
<p>Step 2: Select i= mage folder and open dataset.</p>
</th>
<th class= =3D"confluenceTh">Step 3: View with opened image stack. Use
the sli= der of in the phase contrast histogram (top) to adjust image
saturatio= n for better channel visibility.</th>
</tr>
<tr>= ;
<td colspan=3D"1" class=3D"confluenceTd">
<div class= =3D"content-wrapper">
<p><span class=3D
"confluence-e= mbedded-file-wrapper confluence-embedded-manual-size"><img class=3D"c= onfluence-embedded-image confluence-thumbnail"
draggable=3D"false" hei= ght=3D"250" src=3D
"attachments/314948158/314950704.png" data-image-sr= c=3D
"attachments/314948158/314950704.png"
data-unresolved-commen= t-count=3D"0" data-linked-resource-id=3D
"314950704" data-linked-resou= rce-version=3D"1"
data-linked-resource-type=3D"attachment"
data-l= inked-resource-default-alias=3D"image2022-4-26_15-0-46.png"
data-base-= url=3D"https://my.url.com"
data-linked-resource-content-type=3D"image/= png"
data-linked-resource-container-id=3D"314948158"
data-linked-= resource-container-version=3D"61" alt=3D""></span></p>
= </div>
</td>
<td colspan=3D"1" class=3D"confluence= Td">
<div class=3D"content-wrapper">
<p><span c= lass=3D
"confluence-embedded-file-wrapper confluence-embedded-manual-s= ize"><img class=3D"confluence-embedded-image confluence-thumbnail"draggable=3D"false" height=3D"250" src=3D
"attachments/314948158/31= 4950710.png" data-image-src=3D
"attachments/314948158/314950710.png"data-unresolved-comment-count=3D"0" data-linked-resource-id=3D
"31= 4950710" data-linked-resource-version=3D"1"
data-linked-resource-type= =3D"attachment"
data-linked-resource-default-alias=3D"image2022-4-26_1= 5-1-20.png"
data-base-url=3D"https://my.url.com"
data-linked-reso= urce-content-type=3D"image/png"
data-linked-resource-container-id=3D"3= 14948158"
data-linked-resource-container-version=3D"61" alt=3D"">&l= t;/span></p>
</div>
</td>
<td colspa= n=3D"1" class=3D"confluenceTd">
<div class=3D"content-wrapper"&g= t;
<p><span class=3D
"confluence-embedded-file-wrapper c= onfluence-embedded-manual-size"><img class=3D"confluence-embedded-ima= ge"
draggable=3D"false" height=3D"250" src=3D
"attachments/314948= 158/314950785.png" data-image-src=3D
"attachments/314948158/314950785.= png"
data-unresolved-comment-count=3D"0" data-linked-resource-id=3D"314950785" data-linked-resource-version=3D"1"
data-linked-resource= -type=3D"attachment"
data-linked-resource-default-alias=3D"image2022-4= -26_15-12-47.png"
data-base-url=3D"https://my.url.com"
data-linke= d-resource-content-type=3D"image/png"
data-linked-resource-container-i= d=3D"314948158"
data-linked-resource-container-version=3D"61" alt=3D""= ></span></p>
</div>
</td>
</tr= >
</tbody>
</table>
</div>
</bo= dy>
</html>
```

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/e4b6b290-ab59-4ff6-83ac-47b017e033f5n%40googlegroups.= com.
------=_Part_453_1407655633.1689314605261-- ------=_Part_452_89323412.1689314605261--