From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/10524 Path: news.gmane.org!not-for-mail From: Jesse Rosenthal Newsgroups: gmane.text.pandoc Subject: Please give the Docx reader a test drive Date: Mon, 11 Aug 2014 17:55:02 -0400 Message-ID: <871tsmwv2h.fsf@jhu.edu> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Trace: ger.gmane.org 1407793968 15399 80.91.229.3 (11 Aug 2014 21:52:48 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 11 Aug 2014 21:52:48 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDF7DMU574PBBHXWUSPQKGQESYCDCLI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Aug 11 23:52:41 2014 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ie0-f185.google.com ([209.85.223.185]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XGxW4-0001kI-5l for gtp-pandoc-discuss@m.gmane.org; Mon, 11 Aug 2014 23:52:32 +0200 Original-Received: by mail-ie0-f185.google.com with SMTP id tr6sf1876977ieb.12 for ; Mon, 11 Aug 2014 14:52:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=from:to:subject:date:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; bh=dS6EqxDyzTYFu3RcT2WHrgWTPDVIoAq5mw1K+BI+05Q=; b=k+IQ/0Tvhmc4TVz0ZsCA2E5soJ16CMUIdBMspBf3cODl28W33EtSqnqzfTlkWXr6UA n1d78LMkLjIKE1uzB0WmGRc3fEQm4Gub9YIzCtDwFIFoYomUYqAC/VaMwNyf1cIyBCbL U5q7gxK6XkgH+o2RoOEbmAKRCdmIrOflq4HbKm+WN2/uF5EGsJrgfSDYCTYwi6FBU+QR xG/tL4M4o0ztVQMbdn7mF7NUZ1W58H+dQ3UbaJmMzlU3Bhgw2fnntU3TgX5aWDOuw4ml +vM29wqBGRBvPzLtG4hOSmn1MCPbK3jSUA7tWa2YqFHYOK47EVNtRBYYY8DftJdWCo4w 1vHg== X-Received: by 10.182.130.131 with SMTP id oe3mr3241obb.26.1407793951277; Mon, 11 Aug 2014 14:52:31 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.182.44.198 with SMTP id g6ls774803obm.68.gmail; Mon, 11 Aug 2014 14:52:30 -0700 (PDT) X-Received: by 10.182.142.5 with SMTP id rs5mr319457obb.15.1407793950465; Mon, 11 Aug 2014 14:52:30 -0700 (PDT) Original-Received: from smtpauth.johnshopkins.edu ([162.129.8.150]) by gmr-mx.google.com with ESMTPS id w2si1954971qcl.3.2014.08.11.14.52.30 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 11 Aug 2014 14:52:30 -0700 (PDT) Received-SPF: none (google.com: prvs=29322f520=jrosenthal-4GNroTWusrE@public.gmane.org does not designate permitted sender hosts) client-ip=162.129.8.150; X-IronPort-AV: E=Sophos;i="5.01,844,1400040000"; d="scan'208";a="76163326" Original-Received: from guppy.hwcampus.jhu.edu (HELO localhost) ([10.161.33.91]) by ipex0.johnshopkins.edu with ESMTP/TLS/AES128-SHA; 11 Aug 2014 17:52:31 -0400 X-Original-Sender: jrosenthal-4GNroTWusrE@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: prvs=29322f520=jrosenthal-4GNroTWusrE@public.gmane.org does not designate permitted sender hosts) smtp.mail=prvs=29322f520=jrosenthal-4GNroTWusrE@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:10524 Archived-At: Dear All, The MS Word docx reader in the new pandoc is working pretty well these days. Before the next release, though, I'd love it if we could run as many real-world Word docs through it as possible, to catch any odd behavior. As many different academic/professional fields as possible would be ideal, since I know everyone uses word a bit differently. Everyone testing it so far has brought some oversight to my attention, so I'd love to get more eyes on it. If you do try it out, and you find something that doesn't behave correctly, please open an issue on my pandoc fork (), and send me the document over email if it's possible to share it. If you can't share it, it would be great if you could try to reproduce the issue in a different document. Some notes: - All text, and all text formatting (unless it comes from an unusual style) should be preserved. If it isn't, it's a bug. - There's not much we can do, with a few exceptions, with ad-hoc visual stylization: making columns by pressing space a lot, pressing return to make the end-of-the-line a bit prettier. The rule of thumb is: can the property in question stand a change in margins and font? If so, we should probably be able to interpret it. If not, we probably can't. - Headers, titles and the like will be interpreted correctly if they have the correct style. The reader can't guess at a header just because some text is in bold, or uses another font. (Though at some point in the future, I might introduce a filter with some heuristics for guessing.) - Block quotes should be picked up by either styling with Quote or BlockQuote, or by block indentation. If someone uses another style to produce a blockquote, please let me know, so I can add it to the list. - Track-changes can be used with the "--track-changes=accept|reject|all". accept will take the insertions, reject will stick with the deletions, and all will put in everything, marked up with spans. - Equations should appear as LaTeX. Anyway, please do give it a try and let me know, through the channels above, what weirdnesses you encounter. To get the development pandoc, it's probably best to use a cabal sandbox (available, I believe in cabal >= 1.18). git clone https://github.com/jgm/pandoc.git cd pandoc cabal update cabal sandbox --init cabal install --only-dependencies cabal install The binary will then be located in pandoc/.cabal-sandbox/bin. Thanks, Jesse