From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32712 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: mails.lists.2012-1-rRloVJBGZzogAv4oPwG0Al6hYfS7NtTn@public.gmane.org Newsgroups: gmane.text.pandoc Subject: Converting HTML tables to Markdown Date: Sun, 28 May 2023 14:04:24 +0200 Message-ID: <3b196e5a-93f8-77dd-366d-9bcff734ce64@ohrner.net> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="------------acC0p66nYfGisSxSvT0yeL3v" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="20855"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.1 To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDBMTHOB3MPBBS4GZWRQMGQEKLEUESY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sun May 28 14:04:33 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ed1-f59.google.com ([209.85.208.59]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1q3F8S-0005DF-Le for gtp-pandoc-discuss@m.gmane-mx.org; Sun, 28 May 2023 14:04:32 +0200 Original-Received: by mail-ed1-f59.google.com with SMTP id 4fb4d7f45d1cf-51495d51e0fsf428803a12.2 for ; Sun, 28 May 2023 05:04:32 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1685275472; cv=pass; d=google.com; s=arc-20160816; b=wh5aiEeCR8R/ZzTpEmryvFASS/rSz9UxASYnEGbhzxdJ3PS2d6iRS2vvBlaTJs0tH9 Oc52z5jFcpN0ImVREIYxzwPKaEzXTTDs+0T8nt+Ss65+WjZ5uf4dC8uxFlQlKSacnCoR D7h2upHwtD8iKtBqzcZ4TTf/1VxAKmtmwy7LUEqHj2y6yPVmbiaeaWkNFdZpKcE4jc4W WsjYDIV5CG4HazSCWQ4Xvon8MbeTsgBa89iNq5POdihfjgwTbbUZd2ObqT0SfhiyeE9L QIIi2uUnMmjivwAzJwtHV4RtQRg1Mh/pHWGUnRZAROFTJVtyG8HxxioPgEWd39Nlsw/0 g3hg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:subject:from :content-language:to:user-agent:mime-version:date:message-id:sender :dkim-signature; bh=2J3LXU93Ns0XBnGoW8HgRKW50xe7okjYghHin+qDFYw=; b=VnipFXHFiXVb9G40nv5NI2sjnfRDVfGGIChXJ40Mu7v6GgMNTT9LS/SfxNGxC2/Fs3 SFYJJ9roiu6pRTg/ZOT36IMnIVkcLzPLOHYB0axmon/mGDEJCU7GwYJm+3FH6NPX5PGb QWgYoPaCwS0ayj1ucWftjRoJshCPh1kEWegwSV47UkPOupTwCwf1Om4lA2UYyYECiLEE vFhw3LgY1Lae9dRT1IY2aeM9IrC47iN/dZop6cYyvfz9XuKeQTGqjy6NZTw+KpD7t6+/ KTT0BdbpZQ9MP7t8/Fg1KRlSX+XtvGuaG67ddOVA1kiIZFiuZpSsmZ7XPvNO4omBaW6D iPnA== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=none (google.com: gunter.ohrner.net does not designate permitted sender hosts) smtp.mailfrom=mails.lists.2012-1-rRloVJBGZzogAv4oPwG0Al6hYfS7NtTn@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1685275472; x=1687867472; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:subject:from :content-language:to:user-agent:mime-version:date:message-id:sender :from:to:cc:subject:date:message-id:reply-to; bh=2J3LXU93Ns0XBnGoW8HgRKW50xe7okjYghHin+qDFYw=; b=I0fPdI5j5tcrwl+Iez9yELfUP75Ja7ryTiUd4UAV/Xl898LJGRrA1zT2+RuAJlZXY/ yCBmTTh+At8etp6IstwsWwZprpN0juluWuwt8fpgT2WbnzofE2OuG8NTGpygc/RwE07J 1c2+wkEl3C3v41KAqAWWRZMv3/JbKdPfUK30HOHdlP14kdPfRWdP2qzf7+UFQJxBLFZx i9rlNwMdqEqIEc1UcYHJbMiHESwQw86js+B9Bwihl2dn3ITi2ZUfl6N0PJzGv8h/beye GfoPW3VGpxKSLo2GR+XLtGWPpJJk6sf/fUcYcQ3yHry38SmSpRchOC1DuTE7z3wWpMuW X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685275472; x=1687867472; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:subject:from :content-language:to:user-agent:mime-version:date:message-id :x-beenthere:x-gm-message-state:sender:from:to:cc:subject:date :message-id:reply-to; bh=2J3LXU93Ns0XBnGoW8HgRKW50xe7okjYghHin+qDFYw=; b=Vw3bM0QbB7JK+ZBeUghOzIB7J7HzTcEilo6oMZOM2h8j9YoZoRIjfFfyLvJD4ReqVZ zrdJxtdyvbIiztm6ClBwE+x0eF/rTIp6OHmWqHOY2xFRcXYOSDEUzXNVcuRBOQ0swNYg 33bEdcPHEZ2+RpYdK0J4vVjnQJWCF4JUjUQ26pj1V/vl5PA0dvm4oYfq5qBW/sQr19VY 6bWafjih8Z41Rh+XKsKXjInrjhs88Hnpz8I3SigWZLv8KkFD5eOYYwaDUEc+hvoOCmID b Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AC+VfDw62Dhv/BnoSuyRLG5Oai/njvJORXM+3ZFfK4WDMqB+8Kpgznib aJsSzNXbr+BGtx3UQwYpeLE= X-Google-Smtp-Source: ACHHUZ7ZBucXIDLJO9c/Dx2FnEQuhPLzjk979rvTTHniqKAlcuBu4sOHBU/5GMksg+6Zm8T7ScNWuw== X-Received: by 2002:aa7:c24f:0:b0:514:8e5a:8471 with SMTP id y15-20020aa7c24f000000b005148e5a8471mr4064399edo.13.1685275471534; Sun, 28 May 2023 05:04:31 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aa7:dbd1:0:b0:513:f820:537a with SMTP id v17-20020aa7dbd1000000b00513f820537als771296edt.1.-pod-prod-07-eu; Sun, 28 May 2023 05:04:26 -0700 (PDT) X-Received: by 2002:a50:fb90:0:b0:510:f44c:4b71 with SMTP id e16-20020a50fb90000000b00510f44c4b71mr6207302edq.27.1685275466291; Sun, 28 May 2023 05:04:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685275466; cv=none; d=google.com; s=arc-20160816; b=Nq5Ac/ryacL0lUuQOu7p6LH5K95/bzyOD1hoJYHUoqAoGPnppsnojFplGguDDhLrbq 3THZ/cdcFtZIOkAkPGRMR4L3MXqy29bNEnxIgC7oOid54io9uVw337micYtoKzZIxJWs 3YuHtit4Bamv+qgkTDV51Q43e3IH/YDLv0SHLx8OH//WLI76beqktEbAL3a8z9VFfgoO khIUH6UuKJFcQMZPRiLiJGGcsZ6uItFQG4W/iQz+si04eJQL3rNeGmfBpR7OmxnyhfzH I7FY/mIBmca5FhOU7xEPq5iBsqzJWzyZvP5/eIjjCGgHOibN1uaEmitSl5VChzZJrvlv V0nA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=subject:from:content-language:to:user-agent:mime-version:date :message-id; bh=5lQIGpvouv4pYsGp2CDqLgPGofrxQqc2SiaHLl5bQdo=; b=IpIJ8SfgTm66KT7ImbbyWotLma4bnjHmM29HdeqRRJNpbk7v7QWoqcdk8iocfLBgbt GUYAhKIJvlbUZIdVzyxWayVdiAWJk4H1E3RzZwkJ8Vk9a+tysS1SNDuvx57sXg8dn/Zg YkHVrRnAl2xaYD0nVil1Bk3LDe4yqM6EqtyQQoUcHYOoEUVdnjDndRjr+Qukbk3RJmEj dA3Z2tcpgPdtcxRFbg3fcaTunnftmg+3PuHxCt5x/gDw0+zWO2O94KsooT21YkaLoIKI 5II9syT5n8KmCiMK1Ixlf9oXaC2iIwMcvsaBcy1yym8wU/N6ZOzC45q/niyqb52CMMdd 5JVQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=none (google.com: gunter.ohrner.net does not designate permitted sender hosts) smtp.mailfrom=mails.lists.2012-1-rRloVJBGZzogAv4oPwG0Al6hYfS7NtTn@public.gmane.org Original-Received: from luggage.ohrner.net (luggage.ohrner.net. [2a00:fe0:1:27:4711:815:42:1000]) by gmr-mx.google.com with ESMTPS id b19-20020a056402279300b00510cd4eed58si286378ede.2.2023.05.28.05.04.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 May 2023 05:04:26 -0700 (PDT) Received-SPF: none (google.com: gunter.ohrner.net does not designate permitted sender hosts) client-ip=2a00:fe0:1:27:4711:815:42:1000; Original-Received: from dynamic-2a01-0c23-6d5b-9600-c859-c9ec-5b73-71a0.c23.pool.telefonica.de ([2a01:c23:6d5b:9600:c859:c9ec:5b73:71a0]) by luggage.ohrner.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1q3F8M-001dfG-3i for pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; Sun, 28 May 2023 14:04:25 +0200 Content-Language: en-US X-Orig-fvyLszf-Subject: Converting HTML tables to Markdown X-Original-Sender: mails.lists.2012-1-rRloVJBGZzogAv4oPwG0Al6hYfS7NtTn@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=none (google.com: gunter.ohrner.net does not designate permitted sender hosts) smtp.mailfrom=mails.lists.2012-1-rRloVJBGZzogAv4oPwG0Al6hYfS7NtTn@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32712 Archived-At: This is a multi-part message in MIME format. --------------acC0p66nYfGisSxSvT0yeL3v Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable Hi all, I'm a bit clueless with HTML table conversion at the moment. I currently use pandoc 3.1.2 on x64 and I'm converting some HTML dumps=20 into Markdown (gfm). I read the manpage, the web site docs and googled, but apparently missed=20 the crucial pointer so far. In many cases, by default tables end up as raw HTML in the Markdown output. I tried to circumvent this by using =C2=A0=C2=A0=C2=A0 pandoc -f html -t gfm-raw_html However, instead of the actual table, only the following text is being=20 output then: =C2=A0=C2=A0=C2=A0 [TABLE] That's obviously not what I want. If I add something like -native_divs-native_spans-fenced_divs-bracketed_spans to my output format spec, nothing is output any more for my affected test f= ile, i.e. the output stays totally empty. I just want my HTML table to be converted into a corresponding Markdown=20 table, at least as good as it can be expressed in Markdown - I'm aware=20 that HTML tables allow for more features and in many cases may not be=20 converted perfectly or not without some information loss or adaptions. However just getting the word "TABLE" is the output is too much=20 simplified in my eyes, with all table content being completely lost... Best regards, =C2=A0 Gunter --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/3b196e5a-93f8-77dd-366d-9bcff734ce64%40ohrner.net. --------------acC0p66nYfGisSxSvT0yeL3v Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi all,


I'm a bit clueless with HTML table conversion at the moment.

I currently use pandoc 3.1.2 on x64 and I'm converting some HTML dumps into Markdown (gfm).

I read the manpage, the web site docs and googled, but apparently missed the crucial pointer so far.


In many cases, by default tables end up as raw HTML in the Markdown output.

I tried to circumvent this by using

=C2=A0=C2=A0=C2=A0 pandoc -f html -t gfm-ra= w_html

However, instead of the actual table, only the following text is being output then:

=C2=A0=C2=A0=C2=A0 [TABLE]

That's obviously not what I want.

If I add something like

    -native_divs-native_spans-fenced_divs-bracketed_spans

to my output format spec, nothing is output any more for my affected test f=
ile, i.e. the output stays totally empty.


I just want my HTML table to be converted into a corresponding Markdown table, at least as good as it can be expressed in Markdown - I'm aware that HTML tables allow for more features and in many cases may not be converted perfectly or not without some information loss or adaptions.

However just getting the word "TABLE" is the output is too much simplified in my eyes, with all table content being completely lost...


Best regards,

=C2=A0 Gunter

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/3b196e5a-93f8-77dd-366d-9bcff734ce64%40ohrner.net.
--------------acC0p66nYfGisSxSvT0yeL3v--