From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/16311 Path: news.gmane.org!.POSTED!not-for-mail From: Kolen Cheung Newsgroups: gmane.text.pandoc Subject: Re: bug: docx (containing table) to native and docx to markdown then to native is hugely different Date: Wed, 7 Dec 2016 01:57:53 -0800 (PST) Message-ID: <5c10d3f8-8448-4b5f-bfcb-f9c2a4897ec4@googlegroups.com> References: <3c212e85-1e24-4fa2-817e-051e55f5821d@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_197_992477022.1481104673833" X-Trace: blaine.gmane.org 1481104676 25602 195.159.176.226 (7 Dec 2016 09:57:56 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 7 Dec 2016 09:57:56 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCS252WXTEIBBIV2T7BAKGQELBXLGVA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Dec 07 10:57:51 2016 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ua0-f190.google.com ([209.85.217.190]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cEYz0-0005x0-MS for gtp-pandoc-discuss@m.gmane.org; Wed, 07 Dec 2016 10:57:50 +0100 Original-Received: by mail-ua0-f190.google.com with SMTP id 51sf27263789uai.0 for ; Wed, 07 Dec 2016 01:57:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=P64cVFbQaa1esfhEuroKK6gWdau8M/RgkLGowlwKlWo=; b=c6ly74iT0JSPHyKf7WaBHOQ4IbQFKzuuFKJfEAF2kzoKQW2FArAG0rvr8JobxP6j1r oQpVpgYaA8t2sfvuRF0nKYSKd+8WUKwODTu7K4OA++8IAtrcR7qxIR4jhyDnmwM/r9ly c91ohhC2TC39C8m//Rtna5aonXFFcoSmtykhhP27FugxDExGk2ShKpIDBiU6cMA01uvn VQtg/mUDcqHJl0mnMNTEQRHOxLLy3KMlIR+TQNtaMeDDux3BzvezLf805+oN7ptU58T0 YC7YqiESKCJhLi4TgChcrQvv8kaRrfzXe9ysERLAPaKTMY5JHPl9j4B4tcstHktgn26e BuaQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=P64cVFbQaa1esfhEuroKK6gWdau8M/RgkLGowlwKlWo=; b=wvEt2G6EyUZPSOxpJlj6ob7y057vc8cUs0HIbptG/eAZl0nXQNb1mh0/9Zm5LpQ/to CA5QsODkc5ifTHJe2r4Fb4DRlK7gOruOb1M5ITLCy0fDmpkjgDHZPU2CRQ5ueYmJVOZ6 7tNQOrz53KG1lUvzG49vxpoV+tQhUNiTyR0polyqdCBnhIZIba8pmvUBvD56CTM6ru/i dJRcWe+u0F8eFFWonaym5/6AXq0FFyUgmR4ncYgDbZVNdzxwpcSLcvNAmwcBO62xI9bA 8lw+ZC/zkJfGWteEO1vI+pymiresInrOh3eVtMjU2eM8s2SQs79uOxgcWMQsEVIBxNSp lI3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=P64cVFbQaa1esfhEuroKK6gWdau8M/RgkLGowlwKlWo=; b=CEjrKNjqRdUD7k2qct72RUchStv84YQEeXKi9Th7uvf7MLXEl5XrtXwSLmyRM+tZEC LWuHFnBMPkaKoJwLI6/adDiyZZDvBeUTe7W9qgBpJVahzP77CsWditGp7zmJMTVWNhBH CwA8k8rFrKfPrXAWBvbE8CW5R0WBgMCKMLX9r+OCDSJD2tYmzSOW0KbP0H9mNkzSpvy6 EgIBtsQsetmZZAEin+3xWjyvxV76lDRdZviej0idNdBHi7sQ33KRJb8XixjB5iJPleO8 UWwDqwwNMGilm32ZsM6YF7oNMmYhtIyV1rA5pOZvs4mcFO4fVEJilhz3w41JQoImnjNk Bt7A== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AKaTC00mmyNaG+5VUCMfYKZZSACMoxFLMk5WtRq6tFhiGS+if62Onm+pkFvRinEjoEQ7pg== X-Received: by 10.157.37.125 with SMTP id j58mr4334546otd.18.1481104674746; Wed, 07 Dec 2016 01:57:54 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.157.5.103 with SMTP id 94ls3505219otw.35.gmail; Wed, 07 Dec 2016 01:57:54 -0800 (PST) X-Received: by 10.157.45.170 with SMTP id g39mr4324549otb.16.1481104674293; Wed, 07 Dec 2016 01:57:54 -0800 (PST) In-Reply-To: <3c212e85-1e24-4fa2-817e-051e55f5821d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:16311 Archived-At: ------=_Part_197_992477022.1481104673833 Content-Type: multipart/alternative; boundary="----=_Part_198_2106296068.1481104673833" ------=_Part_198_2106296068.1481104673833 Content-Type: text/plain; charset=UTF-8 I'm not sure if I've correctly identify the problem: the docx reader might treat the tables as having 1 header row only, while the table cell is empty, the structure is something like this: ```json [Table [] [AlignDefault,AlignDefault,AlignDefault] [0.0,0.0,0.0] [[Para [Str "x",Space,Str "y"]] ,[Para [Strong [Emph [Str "a",Space,Str "b"]]]] ,[Para [Strong [Emph [Str "Math"]]]]] []] ``` The 2-4th row seems to be a header row, then the `[]` is the table body, which has no length, but should have a length of 3. Panflute asserts that was true, that explains the error I got from my filter. And pandoc read this just fine, and is indeed output by pandoc's docx reader. On the other hand, pandoc's writers like markdown and html seem to parse this input incorrectly. Is it a valid pandoc AST? And as a general rule, is it safe to assert that the align-list, width-list, header-list, and each of the row-list are all having the same length? -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5c10d3f8-8448-4b5f-bfcb-f9c2a4897ec4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_198_2106296068.1481104673833 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I'm not sure if I've correctly identify the proble= m: the docx reader might treat the tables as having 1 header row only, whil= e the table cell is empty, the structure is something like this:

```json
[Table [] [AlignDefault,AlignDefault,AlignD= efault] [0.0,0.0,0.0]
=C2=A0[[Para [Str "x",Space,Str &= quot;y"]]
=C2=A0,[Para [Strong [Emph [Str "a",Spac= e,Str "b"]]]]
=C2=A0,[Para [Strong [Emph [Str "Mat= h"]]]]]
=C2=A0[]]
```
The 2-4th ro= w seems to be a header row, then the `[]` is the table body, which has no l= ength, but should have a length of 3. Panflute asserts that was true, that = explains the error I got from my filter.

And pando= c read this just fine, and is indeed output by pandoc's docx reader. On= the other hand, pandoc's writers like markdown and html seem to parse = this input incorrectly. Is it a valid pandoc AST?

= And as a general rule, is it safe to assert that the align-list, width-list= , header-list, and each of the row-list are all having the same length?

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/5c10d3f8-8448-4b5f-bfcb-f9c2a4897ec4%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_198_2106296068.1481104673833-- ------=_Part_197_992477022.1481104673833--