public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: "'William Lupton' via pandoc-discuss" <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Converting HTML tables to Markdown
Date: Sun, 28 May 2023 22:10:17 +0100	[thread overview]
Message-ID: <CAEe_xxgyB95nEYoOziSr84wfGNaj8bZNNo07vVORpF0KqQO7UQ@mail.gmail.com> (raw)
In-Reply-To: <3b196e5a-93f8-77dd-366d-9bcff734ce64-8DM0qNeCP8OsTnJN9+BGXg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2791 bytes --]

I think that your HTML must contain tables that cannot be represented in
gfm and therefore are left as HTML (which is valid gfm I believe?). When
you specify -raw_html you are forbidding pandoc from doing this, so I guess
this is why it outputs [TABLE]. As for the empty output in the last case,
when I tried it I got a "The extension native_divs is not supported for
gfm" error, which is presumably why no output was generated.

On Sun, 28 May 2023 at 13:04, <mails.lists.2012-1-rRloVJBGZzogAv4oPwG0Al6hYfS7NtTn@public.gmane.org> wrote:

> Hi all,
>
>
> I'm a bit clueless with HTML table conversion at the moment.
>
> I currently use pandoc 3.1.2 on x64 and I'm converting some HTML dumps
> into Markdown (gfm).
>
> I read the manpage, the web site docs and googled, but apparently missed
> the crucial pointer so far.
>
>
> In many cases, by default tables end up as raw HTML in the Markdown output.
>
> I tried to circumvent this by using
>
>     pandoc -f html -t gfm-raw_html
>
> However, instead of the actual table, only the following text is being
> output then:
>
>     [TABLE]
>
> That's obviously not what I want.
>
> If I add something like
>
>     -native_divs-native_spans-fenced_divs-bracketed_spans
>
> to my output format spec, nothing is output any more for my affected test file, i.e. the output stays totally empty.
>
>
> I just want my HTML table to be converted into a corresponding Markdown
> table, at least as good as it can be expressed in Markdown - I'm aware that
> HTML tables allow for more features and in many cases may not be converted
> perfectly or not without some information loss or adaptions.
>
> However just getting the word "TABLE" is the output is too much simplified
> in my eyes, with all table content being completely lost...
>
>
> Best regards,
>
>   Gunter
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/3b196e5a-93f8-77dd-366d-9bcff734ce64%40ohrner.net
> <https://groups.google.com/d/msgid/pandoc-discuss/3b196e5a-93f8-77dd-366d-9bcff734ce64%40ohrner.net?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxgyB95nEYoOziSr84wfGNaj8bZNNo07vVORpF0KqQO7UQ%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4531 bytes --]

      parent reply	other threads:[~2023-05-28 21:10 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-28 12:04 mails.lists.2012-1-rRloVJBGZzogAv4oPwG0Al6hYfS7NtTn
     [not found] ` <3b196e5a-93f8-77dd-366d-9bcff734ce64-8DM0qNeCP8OsTnJN9+BGXg@public.gmane.org>
2023-05-28 21:10   ` 'William Lupton' via pandoc-discuss [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEe_xxgyB95nEYoOziSr84wfGNaj8bZNNo07vVORpF0KqQO7UQ@mail.gmail.com \
    --to=pandoc-discuss-/jypxa39uh5tlh3mbocffw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).