public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Kolen Cheung <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: bug: docx (containing table) to native and docx to markdown then to native is hugely different
Date: Wed, 7 Dec 2016 01:57:53 -0800 (PST)	[thread overview]
Message-ID: <5c10d3f8-8448-4b5f-bfcb-f9c2a4897ec4@googlegroups.com> (raw)
In-Reply-To: <3c212e85-1e24-4fa2-817e-051e55f5821d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1527 bytes --]

I'm not sure if I've correctly identify the problem: the docx reader might 
treat the tables as having 1 header row only, while the table cell is 
empty, the structure is something like this:

```json
[Table [] [AlignDefault,AlignDefault,AlignDefault] [0.0,0.0,0.0]
 [[Para [Str "x",Space,Str "y"]]
 ,[Para [Strong [Emph [Str "a",Space,Str "b"]]]]
 ,[Para [Strong [Emph [Str "Math"]]]]]
 []]
```
The 2-4th row seems to be a header row, then the `[]` is the table body, 
which has no length, but should have a length of 3. Panflute asserts that 
was true, that explains the error I got from my filter.

And pandoc read this just fine, and is indeed output by pandoc's docx 
reader. On the other hand, pandoc's writers like markdown and html seem to 
parse this input incorrectly. Is it a valid pandoc AST?

And as a general rule, is it safe to assert that the align-list, 
width-list, header-list, and each of the row-list are all having the same 
length?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5c10d3f8-8448-4b5f-bfcb-f9c2a4897ec4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2154 bytes --]

  parent reply	other threads:[~2016-12-07  9:57 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-07  7:06 Kolen Cheung
     [not found] ` <3c212e85-1e24-4fa2-817e-051e55f5821d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-12-07  9:57   ` Kolen Cheung [this message]
2016-12-07 10:23   ` Kolen Cheung

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c10d3f8-8448-4b5f-bfcb-f9c2a4897ec4@googlegroups.com \
    --to=christian.kolen-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).