From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/16310 Path: news.gmane.org!.POSTED!not-for-mail From: Kolen Cheung Newsgroups: gmane.text.pandoc Subject: bug: docx (containing table) to native and docx to markdown then to native is hugely different Date: Tue, 6 Dec 2016 23:06:11 -0800 (PST) Message-ID: <3c212e85-1e24-4fa2-817e-051e55f5821d@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1812_1818018595.1481094371305" X-Trace: blaine.gmane.org 1481094374 15653 195.159.176.226 (7 Dec 2016 07:06:14 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 7 Dec 2016 07:06:14 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCS252WXTEIBBY7JT3BAKGQEQJGUCWI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Dec 07 08:06:09 2016 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-pg0-f62.google.com ([74.125.83.62]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cEWIq-0003EW-Im for gtp-pandoc-discuss@m.gmane.org; Wed, 07 Dec 2016 08:06:08 +0100 Original-Received: by mail-pg0-f62.google.com with SMTP id 3sf26759765pgd.0 for ; Tue, 06 Dec 2016 23:06:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=sender:date:from:to:message-id:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=eEgvM7bA+li1dgJIhlOIjdWaj7u+6ND0NGtDbMjNov8=; b=GJmyohUUgQopQs6m04lctqLDpPYuLfeXC7XBQpvSWBETmmbltPtqGFwZQsEMrxxXkx 0DRpnq/n0+v3UGBHdqs0Dp0L5ivWbPAzjdbuYkOD8BDLQuwYdN2MCy+kRR1DHqRXSQ5w 3610CYkuU48odMLuAJ9BpwbVGeP0hkYFKSAtgul5PtMx1cUb97gniez5+WOSSGKkLO/v ZXnTsuaEfmU7m4mDCXgTVLtezDU+Ax9H2DfZ2WsfEflRcKNo12lLGioxwj/GWoblihAS wTYbsrPIT7TVuMCtzGVSAcvCizUD2hzDOiJ0NIdo2BY/QALyAy3kjjQBuIb8yWtxn8td 88Sw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:message-id:subject:mime-version:x-original-sender :reply-to:precedence:mailing-list:list-id:x-spam-checked-in-group :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=eEgvM7bA+li1dgJIhlOIjdWaj7u+6ND0NGtDbMjNov8=; b=0V5AGLPVXBYnW9V/uqHLfMvSuCKipgheROunmcdyxd7ue56igNyQwtqGMGZHQR8doL lSmhpg1A3IMRC4y0p3/mkNODN7guqLDS6ybXHwTJsD62Q1shz1/1HLB/qz0pR1Tck83u +Z8XDJP2uAw6d3TY/583xTotI+m3GtLKXAFEFnWthULoLpU3NG/zZeG21LbsRdp3SoD4 /ZjgWpgcii11mTgMiy2N6KvmZUe/MuxLvvU5wuEMQrujUGGJ6y9TJPtRntVuHApz8JNm +BQW7kOCEkxQ3Ezs066DNq5wNnETsUuz4/c14Z8AcuwdkM8TVvKUY1WJCr7hQUz3dnTF XctA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=sender:x-gm-message-state:date:from:to:message-id:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=eEgvM7bA+li1dgJIhlOIjdWaj7u+6ND0NGtDbMjNov8=; b=kV1OsL7+/NOJA7XFZYQ7FmqEoeLEoOlHk41tRMvmvfEe3KmwICtKG2oyk+LeOHppt5 fysYMToaoJfvT+ddIN3bMqqD6fXeUthPsdbR1j3EcZFIyzg3AMcz6g2AE1w5rEMsfoXn mjKYPqvrulK32On9oXwALe34kFZ8EjAsJcs2/DOF8SoV3T7CuP04EbhhuvqrqDhqkU16 VBBMyyG1W34WiJ6TZuuULpvfmnIzHncFZ1q5kpjwIJJ0/0VTxAdEIotFIVzHDmzAqd+r SUXr9hYrf+qKy3paQwmAw2hXC0X+J191CE7Cjx9nK5kb9TyMEXXIIVt12fhSsGFCAAkb 8tBw== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AKaTC03hAoi88l6OTtHNStHTPOxw+1glMZ6KA1am6ATTf1qSyRDQ6yKXmmTXZQBYVIxLdQ== X-Received: by 10.157.44.172 with SMTP id p41mr4290489otb.6.1481094372255; Tue, 06 Dec 2016 23:06:12 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.157.37.98 with SMTP id j31ls2744855otd.34.gmail; Tue, 06 Dec 2016 23:06:11 -0800 (PST) X-Received: by 10.157.17.3 with SMTP id g3mr4301014ote.8.1481094371736; Tue, 06 Dec 2016 23:06:11 -0800 (PST) X-Original-Sender: christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:16310 Archived-At: ------=_Part_1812_1818018595.1481094371305 Content-Type: multipart/alternative; boundary="----=_Part_1813_2088434157.1481094371305" ------=_Part_1813_2088434157.1481094371305 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I have a doc with 2 tables. I converted it to docx (with Word) and then=20 trying to use pandoc to convert it to md. I got some strange error, and when looking at the native I found this: 1. pandoc -t native others/Mirrors-Lens.docx -o=20 others/Mirrors-Lens.native is kind of normal=20 2. pandoc -t markdown others/Mirrors-Lens.docx | pandoc -f markdown -t= =20 native -o others/Mirrors-Lens-round.native is hugely different from the= =20 above=20 One of the main difference is that both tables (originally multi-columned)= =20 collapsed to 1 column only in the later round-trip. (Another annoyance to= =20 me is I actually tried to use my pantable2csv filter to capture the table= =20 from the docx directly into csv but resulted in error.) For the 2nd table, I=E2=80=99m guessing the existence of <, > causes proble= ms. For=20 the 1st I have no clue since the native looks fine to me. But since I do not own the copyright of the file, I am not going to share= =20 this publicly (isn=E2=80=99t really important but just in case=E2=80=A6). I= f anyone is=20 interested in help solving the puzzle / debug pandoc, please give me your= =20 github account and I can open a private repository and invite you. Thanks! =E2=80=8B --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/3c212e85-1e24-4fa2-817e-051e55f5821d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_1813_2088434157.1481094371305 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I have a doc with 2 tables. I converted it to docx (with W= ord) and then trying to use pandoc to convert it to md.

I got some strange error, an= d when looking at the native I found this:

  1. pandoc -t native others/Mirrors-= Lens.docx -o others/Mirrors-Lens.native is kind of normal
  2. pandoc -t markdown others/Mirror= s-Lens.docx | pandoc -f markdown -t native -o others/Mirrors-Lens-round.nat= ive is hugely different from the above

One of the main difference i= s that both tables (originally multi-columned) collapsed to 1 column only i= n the later round-trip. (Another annoyance to me is I actually tried to use= my pantable2csv filter to capture the table from the docx directly into cs= v but resulted in error.)

For the 2nd table, I=E2=80= =99m guessing the existence of <, > causes problems. For the 1st I have no clue since the native looks fin= e to me.

But since I do not own the c= opyright of the file, I am not going to share this publicly (isn=E2=80=99t = really important but just in case=E2=80=A6). If anyone is interested in hel= p solving the puzzle / debug pandoc, please give me your github account and= I can open a private repository and invite you. Thanks!

=E2=80=8B

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/3c212e85-1e24-4fa2-817e-051e55f5821d%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_1813_2088434157.1481094371305-- ------=_Part_1812_1818018595.1481094371305--