From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/11644 Path: news.gmane.org!not-for-mail From: Jesse Rosenthal Newsgroups: gmane.text.pandoc Subject: Re: UTF-8 error when converting Docx to Markdown Date: Wed, 31 Dec 2014 17:34:46 -0500 Message-ID: <2ce9bf$1t0b4t@IPEB1.johnshopkins.edu> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1420065304 7776 80.91.229.3 (31 Dec 2014 22:35:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 31 Dec 2014 22:35:04 +0000 (UTC) Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To: Farhan Khan Original-X-From: pandoc-discuss+bncBDF7DMU574PBBD7USGSQKGQE7763ABQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Dec 31 23:34:59 2014 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ob0-f186.google.com ([209.85.214.186]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Y6Rqz-0003uP-AE for gtp-pandoc-discuss@m.gmane.org; Wed, 31 Dec 2014 23:34:57 +0100 Original-Received: by mail-ob0-f186.google.com with SMTP id nt9sf11301882obb.3 for ; Wed, 31 Dec 2014 14:34:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=message-id:date:subject:from:to:cc:mime-version:content-type :content-transfer-encoding:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=20i1pb11buoHUC5N2GB9ShSG4Wu59QhxOeCJms09ldU=; b=ylYSPSxQxtAGepqRP0e6pO4z/Vtm2gFdgu41N+7jgyVWpEd5YXMA8Zibjs0LjpKasV HhDDYVvS2lyrB2QQ9N1rMgEg2wCVQlelwZ5FMJ3f+wa0t1i3UbEo/wSyYQpNvGFOJ2Kn iKt+RkV6lgzFR/ij5kIT/53r6w2y9zP9TRF89S9K8jOd8Xqg7hBhuZd7BGJKvkf/xCrF xMUmTL8+JblpPyRow5ZJeCP5jrpz4f3SbWiM3eBhLqWkEkLz/QYaVryNy7lmv/uMmMN6 xJ1OkEh6baRH1DkdVaVPT/Z+JUt5UFrL4hC8T+SmcneRLnxo1TsFw9qnmhsQY4osUjLv gtsg== X-Received: by 10.50.142.37 with SMTP id rt5mr663334igb.17.1420065296452; Wed, 31 Dec 2014 14:34:56 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.107.151.140 with SMTP id z134ls5671074iod.27.gmail; Wed, 31 Dec 2014 14:34:55 -0800 (PST) X-Received: by 10.50.164.164 with SMTP id yr4mr51033116igb.2.1420065295821; Wed, 31 Dec 2014 14:34:55 -0800 (PST) Original-Received: from smtpauth.johnshopkins.edu (smtpauth.johnshopkins.edu. [162.129.8.130]) by gmr-mx.google.com with ESMTPS id r4si1680520qca.0.2014.12.31.14.34.55 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 31 Dec 2014 14:34:55 -0800 (PST) Received-SPF: none (google.com: prvs=4354cc3c2=jrosenthal-4GNroTWusrE@public.gmane.org does not designate permitted sender hosts) client-ip=162.129.8.130; X-IronPort-AV: E=Sophos;i="5.07,675,1413259200"; d="scan'208";a="63974557" Original-Received: from unknown (HELO [30.71.182.173]) ([172.56.3.105]) by IPEB1.johnshopkins.edu with ESMTP/TLS/RC4-MD5; 31 Dec 2014 17:34:56 -0500 X-Original-Sender: jrosenthal-4GNroTWusrE@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=none (google.com: prvs=4354cc3c2=jrosenthal-4GNroTWusrE@public.gmane.org does not designate permitted sender hosts) smtp.mail=prvs=4354cc3c2=jrosenthal-4GNroTWusrE@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:11644 Archived-At:

Yeah, docx reader support was only added in 1.13. So it was = trying to read raw binary data as HTML or something.

On Dec 31, 2014 4:54 PM, Farhan Khan <khanzf@gmail.= com> wrote:
My version information below:

I am usin= g Ubuntu 14.04.

$ pandoc -v
pandoc 1.12.= 2.1
Compiled with texmath 0.6.5.2, highlighting-kate 0.5.5.1.
Syntax highlighting is supported for the following languages:
<= div>=C2=A0 =C2=A0 actionscript, ada, apache, asn1, asp, awk, bash, bibtex, = boo, c, changelog,
=C2=A0 =C2=A0 clojure, cmake, coffee, coldfusi= on, commonlisp, cpp, cs, css, curry, d,
=C2=A0 =C2=A0 diff, djang= otemplate, doxygen, doxygenlua, dtd, eiffel, email, erlang,
=C2= =A0 =C2=A0 fortran, fsharp, gnuassembler, go, haskell, haxe, html, ini, jav= a, javadoc,
=C2=A0 =C2=A0 javascript, json, jsp, julia, latex, le= x, literatecurry, literatehaskell,
=C2=A0 =C2=A0 lua, makefile, m= andoc, markdown, matlab, maxima, metafont, mips, modelines,
=C2= =A0 =C2=A0 modula2, modula3, monobasic, nasm, noweb, objectivec, objectivec= pp, ocaml,
=C2=A0 =C2=A0 octave, pascal, perl, php, pike, postscr= ipt, prolog, python, r,
=C2=A0 =C2=A0 relaxngcompact, rhtml, roff= , ruby, rust, scala, scheme, sci, sed, sgml, sql,
=C2=A0 =C2=A0 s= qlmysql, sqlpostgresql, tcl, texinfo, verilog, vhdl, xml, xorg, xslt, xul,<= /div>
=C2=A0 =C2=A0 yacc, yaml
Default user data directory: /= home/farhan/.pandoc
Copyright (C) 2006-2013 John MacFarlane
=
This is free software; see the source for = copying conditions.=C2=A0 There is no
warranty, not even for merc= hantability or fitness for a particular purpose.


On Wed, Dec 31, 2014 at 7:05 AM, Jesse Rosen= thal <jrosenthal-4GNroTWusrE@public.gmane.org> wrote:

Hi,

Farhan <khanzf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wri= tes:

> Sorry to answer my own question, but a few hours of Googling for me th= e
> answer. You can use the tool unoconv to accomplish this task:
>
> unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md
>
> Hope this helps the next guy!

That still seems weird -- are you sure you're using a pandoc version that actually supports reading docx?

What's the output of `pandoc -v`?

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-di= scuss/2ce9bf%241t0b4t%40IPEB1.johnshopkins.edu.
For more options, visit http= s://groups.google.com/d/optout.