From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32714 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "'William Lupton' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Re: Converting HTML tables to Markdown Date: Sun, 28 May 2023 22:10:17 +0100 Message-ID: References: <3b196e5a-93f8-77dd-366d-9bcff734ce64@ohrner.net> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000004deb9605fcc76611" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22765"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCS4HJ6WSAHBBRUGZ6RQMGQEW3WVF6A-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sun May 28 23:10:35 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lf1-f61.google.com ([209.85.167.61]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1q3Net-0005md-4G for gtp-pandoc-discuss@m.gmane-mx.org; Sun, 28 May 2023 23:10:35 +0200 Original-Received: by mail-lf1-f61.google.com with SMTP id 2adb3069b0e04-4f3a7765189sf1289163e87.0 for ; Sun, 28 May 2023 14:10:35 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1685308234; cv=pass; d=google.com; s=arc-20160816; b=MDntz+aANHZH6y/GRaCF4R3vzT3rThkbhscxOtBfo+IKO0d5BuNilo6JXVzlfu+xab VdOV29Ue1JCDdBSZDyx8SI+Xyu4l7ZPi7hsd3CWKMLa2NPWiFWnE0NGyZV+ahPG3JOkn /Ol4GhJ9hAucbzID/KmLiDDFokvFBi1gbofjYnmWA0/pv6nOFvJSQ1bYcu5IkSIfLkU7 bGadD9TCvxubcdeI2Vc4qT5LtwVjZD64EdvAnhLMItkq1HNN0H2s1EnCOgnRTBeT/AjJ rNdvMy4ckztf/BGrqYJaGRGrJsevNKPouorVoa55bbDe70dSJBRxnrJGbgb4o0LucoJ4 +h+g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:to:subject:message-id:date :from:in-reply-to:references:mime-version:dkim-signature; bh=BKV1EwTzwqk8E0cQaWifz9bKKladGeAfYZlsy4lVZC4=; b=uVjUbBkY5HU0XjtJ/hHN6dsmsP0+xmz3TaEpaqnWLAWLbDnJQybE3qbkJKsbf01uiy jx5waZ1wXVRbz9cOHIQAIbufwHLgHqFQ4ZTZpwzQnCWavatkyA5zmQp6tWT9w1gfYYhU AeGyFHqP75Zn2XAgotzg9360jiri/qMiP3nVdIx294PfsyLAk6JgnStean7qsGGtxbGy 5yA1HVZ52aa4KuKmYzDsK+QAIUsqRmYpu/vQ4zLLGCNnNSOkWY8pQXTjRNmmb7M9qvdW CclMhzc1yQlQeKXr1lYXR2bIAukWPa3HjTyt5KT2TgjKCMjnyn5k3ZjEBCTYJ4ld4PPh +NXw== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@broadband-forum.org header.s=google header.b="m0kq/pyz"; spf=pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2a00:1450:4864:20::62c as permitted sender) smtp.mailfrom=wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadband-forum.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1685308234; x=1687900234; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:to:subject :message-id:date:from:in-reply-to:references:mime-version:from:to:cc :subject:date:message-id:reply-to; bh=BKV1EwTzwqk8E0cQaWifz9bKKladGeAfYZlsy4lVZC4=; b=h60VH/xFvYKRSBizR+W3fcruflRzQO5Q3xKtm2qgPbB7UtVAldPM0xQbYyHWJLZai7 6b09PLr0327R1xmKEQ0WIzybejFpm++1ANfYn1WTJxpAHusupaxkBV9pcYlEDGrvFhy2 vnfstQanT9yMKlMZshYquP1JPKkKkDSWATJATRhWgN7l1m8RThcSEOMuFVJDje/cTOhD b79Wt6UVCUxgRrLlIv7fhhwSUlVnnfAVCYku6kbml6Aet+8fqGQBlV8+BhXqymELhDoJ nvc4PXKBHBDJ73L82rpHhy5SWLjZz3FE27Gg+wMEjHZxo+2KHE9PpRV3Q4sHVsksc3Wr Hr X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685308234; x=1687900234; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:to:subject :message-id:date:from:in-reply-to:references:mime-version :x-beenthere:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BKV1EwTzwqk8E0cQaWifz9bKKladGeAfYZlsy4lVZC4=; b=iWoqWzgtLCfrZ/0qyZ1T6D/mvhjsCZOZ2x30YW8PvOZZIP1QZRb/0171WY/Wn040Wo 92lqXOUJ9ljHAyt7YVE4SHL3woKRDxCaPwAjuUywKbiAXctX+YrtJJ/vh7Qto81Fzgv6 3ZOmml+0A/+xRNC/dYdOlVkAd9Ke5kbmHboBQxYs3jX6IDPoPsvd+F4vkuwpIpxnAafC IINX3Zets9D7sj6KRY9OsGiWokAcPXnSv/pJNVJ1VkJOsDKoH5G8Oxez2nLH9Spa02+S Cq/WayU61x/bn X-Gm-Message-State: AC+VfDx/7C1ANkMntoX7NfaCUi0OEtUZoctuTmQ7XrX0rmRO7qQkbxKL faid3zFzY9WS91g+TcMjthk= X-Google-Smtp-Source: ACHHUZ7Tto9+WBKy218kFMUfRrpTAB0us2vsaoeIvgzD9Du+iG2nIcT8ibh6Iu4jijy2Tp19haYUwg== X-Received: by 2002:a19:c219:0:b0:4f4:d60c:59ac with SMTP id l25-20020a19c219000000b004f4d60c59acmr1801422lfc.6.1685308234302; Sun, 28 May 2023 14:10:34 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:651c:882:b0:2a7:6705:797 with SMTP id d2-20020a05651c088200b002a767050797ls156403ljq.1.-pod-prod-08-eu; Sun, 28 May 2023 14:10:29 -0700 (PDT) X-Received: by 2002:a2e:350b:0:b0:2a8:e4d3:11e2 with SMTP id z11-20020a2e350b000000b002a8e4d311e2mr3163973ljz.39.1685308229802; Sun, 28 May 2023 14:10:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685308229; cv=none; d=google.com; s=arc-20160816; b=Hfh1jn64fFER6hEV+TPtC1lVjCtzxtqcyKcS1cz6O3Vl9ohfYUVNvvQAwwk8K5pPKo 2vGjs/HfVCVHuPmc1ZRk3+1WCgvONMJLI15npxShlWoIh6gMa3sBWR/WtCcYYDJ2Hgpi 61zfsvBfri5BxanEE+cJ5jpNgEjeWUH6Bkh/sWglGvozYW73c45U91HHZOy8/mgQ/MGS 1XAMG83V1fVqPw+tkhrUKuUfINYfg5Ri2qFHnP3uSmfeTerUsj7y3sSFt5oT8vfCE4U3 f6Isn+tJgTcCCyH8EXEYwVYQHpBmJw2vGAHpUerMcSrqUb5v8cDA6WQPABBR9w4OKFyx LNsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=W5C+HYCYqyu/pZOSXeM5sBox17IzteRYGWAhdR/twO0=; b=0XY0KQojOeifk6YNN7YooMRo4l7p2e8/hIeWpjhT9NMuQH+S201t3hY4HVFgtl+uWf 4u2mF4pslfxdag6gYBECG2K8AqcRzHa/2sF0vFPzp3+TTZL+aMGUUEi5HFuVPlGhYR/W rPyTvtEwSY7w8UiRi5ZXam4c8YND1kJXogvX0PfSQW02iFIOVd+UQCRfEZ7jagN4T+pO /1nWnh6YjgaFCQrQLo4UtcfOinaRvo4yOSnl5c5XIBIX0qOGXWJHfcZ39DLr4Go6p7XO mGr65LZarhZ7ej+pzaIYONLKYS/D35EHc4EwWeRCaghAr+FM2hu/GxRoCFm69pIcshJl ElyQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@broadband-forum.org header.s=google header.b="m0kq/pyz"; spf=pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2a00:1450:4864:20::62c as permitted sender) smtp.mailfrom=wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadband-forum.org Original-Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com. [2a00:1450:4864:20::62c]) by gmr-mx.google.com with ESMTPS id s18-20020a2eb8d2000000b002af15d1ad3asi683452ljp.8.2023.05.28.14.10.29 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 28 May 2023 14:10:29 -0700 (PDT) Received-SPF: pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2a00:1450:4864:20::62c as permitted sender) client-ip=2a00:1450:4864:20::62c; Original-Received: by mail-ej1-x62c.google.com with SMTP id a640c23a62f3a-96fffe11714so496236866b.0 for ; Sun, 28 May 2023 14:10:29 -0700 (PDT) X-Received: by 2002:a17:907:97c3:b0:973:dd61:d427 with SMTP id js3-20020a17090797c300b00973dd61d427mr6817618ejc.66.1685308228620; Sun, 28 May 2023 14:10:28 -0700 (PDT) In-Reply-To: <3b196e5a-93f8-77dd-366d-9bcff734ce64-8DM0qNeCP8OsTnJN9+BGXg@public.gmane.org> X-Original-Sender: wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@broadband-forum.org header.s=google header.b="m0kq/pyz"; spf=pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2a00:1450:4864:20::62c as permitted sender) smtp.mailfrom=wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadband-forum.org X-Original-From: William Lupton Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32714 Archived-At: --0000000000004deb9605fcc76611 Content-Type: text/plain; charset="UTF-8" I think that your HTML must contain tables that cannot be represented in gfm and therefore are left as HTML (which is valid gfm I believe?). When you specify -raw_html you are forbidding pandoc from doing this, so I guess this is why it outputs [TABLE]. As for the empty output in the last case, when I tried it I got a "The extension native_divs is not supported for gfm" error, which is presumably why no output was generated. On Sun, 28 May 2023 at 13:04, wrote: > Hi all, > > > I'm a bit clueless with HTML table conversion at the moment. > > I currently use pandoc 3.1.2 on x64 and I'm converting some HTML dumps > into Markdown (gfm). > > I read the manpage, the web site docs and googled, but apparently missed > the crucial pointer so far. > > > In many cases, by default tables end up as raw HTML in the Markdown output. > > I tried to circumvent this by using > > pandoc -f html -t gfm-raw_html > > However, instead of the actual table, only the following text is being > output then: > > [TABLE] > > That's obviously not what I want. > > If I add something like > > -native_divs-native_spans-fenced_divs-bracketed_spans > > to my output format spec, nothing is output any more for my affected test file, i.e. the output stays totally empty. > > > I just want my HTML table to be converted into a corresponding Markdown > table, at least as good as it can be expressed in Markdown - I'm aware that > HTML tables allow for more features and in many cases may not be converted > perfectly or not without some information loss or adaptions. > > However just getting the word "TABLE" is the output is too much simplified > in my eyes, with all table content being completely lost... > > > Best regards, > > Gunter > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/3b196e5a-93f8-77dd-366d-9bcff734ce64%40ohrner.net > > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxgyB95nEYoOziSr84wfGNaj8bZNNo07vVORpF0KqQO7UQ%40mail.gmail.com. --0000000000004deb9605fcc76611 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I think that your HTML must contain tables that cannot be = represented in gfm and therefore are left as HTML (which is valid gfm I bel= ieve?). When you specify -raw_html you are forbidding pandoc from doing thi= s, so I guess this is why it outputs [TABLE]. As for the empty output in th= e last case, when I tried it I got a "The extension native_divs is not= supported for gfm" error, which is presumably why no output was gener= ated.

=20 =20 =20

Hi all,


I'm a bit clueless with HTML table conversion at the moment.

I currently use pandoc 3.1.2 on x64 and I'm converting some HTML dumps into Markdown (gfm).

I read the manpage, the web site docs and googled, but apparently missed the crucial pointer so far.


In many cases, by default tables end up as raw HTML in the Markdown output.

I tried to circumvent this by using

=C2=A0=C2=A0=C2=A0 pandoc -f html -t gfm-ra= w_html

However, instead of the actual table, only the following text is being output then:

=C2=A0=C2=A0=C2=A0 [TABLE]

That's obviously not what I want.

If I add something like

    -native_divs-native_spans-fenced_divs-br=
acketed_spans

to my output format spec, nothing is output any more for my affected test f=
ile, i.e. the output stays totally empty.


I just want my HTML table to be converted into a corresponding Markdown table, at least as good as it can be expressed in Markdown - I'm aware that HTML tables allow for more features and in many cases may not be converted perfectly or not without some information loss or adaptions.

However just getting the word "TABLE" is the output is too= much simplified in my eyes, with all table content being completely lost...


Best regards,

=C2=A0 Gunter

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://grou= ps.google.com/d/msgid/pandoc-discuss/3b196e5a-93f8-77dd-366d-9bcff734ce64%4= 0ohrner.net.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/CAEe_xxgyB95nEYoOziSr84wfGNaj8bZNNo07vVOR= pF0KqQO7UQ%40mail.gmail.com.
--0000000000004deb9605fcc76611--