* Re: UTF-8 error when converting Docx to Markdown @ 2014-12-31 22:34 Jesse Rosenthal 0 siblings, 0 replies; 6+ messages in thread From: Jesse Rosenthal @ 2014-12-31 22:34 UTC (permalink / raw) To: Farhan Khan; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/html, Size: 3998 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* UTF-8 error when converting Docx to Markdown @ 2014-12-31 8:54 Farhan [not found] ` <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Farhan @ 2014-12-31 8:54 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1.1: Type: text/plain, Size: 1199 bytes --] Hi, I can convert a markdown file to docx: $ pandoc test.md -o test.docx The resulting file "test.docx" opens just fine with both MS Word 2013 and LibreOffice. However, when I attempt to convert that same resultant Docx file back to markdown, I get an error: $ pandoc test.docx -t markdown -o test.md pandoc: Cannot decode byte '\x9f': Data.Text.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream I get the same error when I try to convert Docx to any other format as well, such as HTML. The issue seems to be that there is an invalid UTF-8 character. Is there a way to resolve this issue? Please let me know. Thanks! -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9d171289-7a60-4ea0-907d-333e4cfba86e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 4646 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: UTF-8 error when converting Docx to Markdown [not found] ` <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2014-12-31 8:56 ` Farhan [not found] ` <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Farhan @ 2014-12-31 8:56 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1.1: Type: text/plain, Size: 1485 bytes --] I forgot to mention, when I attempt to use iconv, as the documentation suggests, I receive a similar error: $ iconv -t utf-8 test.docx P)Ficonv: illegal input sequence at position 12 On Wednesday, December 31, 2014 3:54:11 AM UTC-5, Farhan wrote: > > Hi, > > I can convert a markdown file to docx: > > $ pandoc test.md -o test.docx > > The resulting file "test.docx" opens just fine with both MS Word 2013 and > LibreOffice. However, when I attempt to convert that same resultant Docx > file back to markdown, I get an error: > > $ pandoc test.docx -t markdown -o test.md > pandoc: Cannot decode byte '\x9f': Data.Text.Encoding.Fusion.streamUtf8: > Invalid UTF-8 stream > > I get the same error when I try to convert Docx to any other format as > well, such as HTML. The issue seems to be that there is an invalid UTF-8 > character. Is there a way to resolve this issue? Please let me know. > > Thanks! > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5305b92a-418f-44dc-87cc-8a42ae30fffd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 4752 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: UTF-8 error when converting Docx to Markdown [not found] ` <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2014-12-31 9:54 ` Farhan [not found] ` <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Farhan @ 2014-12-31 9:54 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1.1: Type: text/plain, Size: 1831 bytes --] Sorry to answer my own question, but a few hours of Googling for me the answer. You can use the tool unoconv to accomplish this task: unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md Hope this helps the next guy! On Wednesday, December 31, 2014 3:56:51 AM UTC-5, Farhan wrote: > > I forgot to mention, when I attempt to use iconv, as the documentation > suggests, I receive a similar error: > > $ iconv -t utf-8 test.docx > P)Ficonv: illegal input sequence at position 12 > > On Wednesday, December 31, 2014 3:54:11 AM UTC-5, Farhan wrote: >> >> Hi, >> >> I can convert a markdown file to docx: >> >> $ pandoc test.md -o test.docx >> >> The resulting file "test.docx" opens just fine with both MS Word 2013 and >> LibreOffice. However, when I attempt to convert that same resultant Docx >> file back to markdown, I get an error: >> >> $ pandoc test.docx -t markdown -o test.md >> pandoc: Cannot decode byte '\x9f': Data.Text.Encoding.Fusion.streamUtf8: >> Invalid UTF-8 stream >> >> I get the same error when I try to convert Docx to any other format as >> well, such as HTML. The issue seems to be that there is an invalid UTF-8 >> character. Is there a way to resolve this issue? Please let me know. >> >> Thanks! >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/74660d48-9d88-4126-a34a-f815e542b4c7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 6085 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: UTF-8 error when converting Docx to Markdown [not found] ` <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2014-12-31 12:05 ` Jesse Rosenthal [not found] ` <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Jesse Rosenthal @ 2014-12-31 12:05 UTC (permalink / raw) To: Farhan, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Hi, Farhan <khanzf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > Sorry to answer my own question, but a few hours of Googling for me the > answer. You can use the tool unoconv to accomplish this task: > > unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md > > Hope this helps the next guy! That still seems weird -- are you sure you're using a pandoc version that actually supports reading docx? What's the output of `pandoc -v`? ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org>]
* Re: UTF-8 error when converting Docx to Markdown [not found] ` <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org> @ 2014-12-31 21:54 ` Farhan Khan 0 siblings, 0 replies; 6+ messages in thread From: Farhan Khan @ 2014-12-31 21:54 UTC (permalink / raw) To: Jesse Rosenthal; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 2458 bytes --] My version information below: I am using Ubuntu 14.04. $ pandoc -v pandoc 1.12.2.1 Compiled with texmath 0.6.5.2, highlighting-kate 0.5.5.1. Syntax highlighting is supported for the following languages: actionscript, ada, apache, asn1, asp, awk, bash, bibtex, boo, c, changelog, clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css, curry, d, diff, djangotemplate, doxygen, doxygenlua, dtd, eiffel, email, erlang, fortran, fsharp, gnuassembler, go, haskell, haxe, html, ini, java, javadoc, javascript, json, jsp, julia, latex, lex, literatecurry, literatehaskell, lua, makefile, mandoc, markdown, matlab, maxima, metafont, mips, modelines, modula2, modula3, monobasic, nasm, noweb, objectivec, objectivecpp, ocaml, octave, pascal, perl, php, pike, postscript, prolog, python, r, relaxngcompact, rhtml, roff, ruby, rust, scala, scheme, sci, sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, texinfo, verilog, vhdl, xml, xorg, xslt, xul, yacc, yaml Default user data directory: /home/farhan/.pandoc Copyright (C) 2006-2013 John MacFarlane Web: http://johnmacfarlane.net/pandoc This is free software; see the source for copying conditions. There is no warranty, not even for merchantability or fitness for a particular purpose. On Wed, Dec 31, 2014 at 7:05 AM, Jesse Rosenthal <jrosenthal-4GNroTWusrE@public.gmane.org> wrote: > > Hi, > > Farhan <khanzf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > Sorry to answer my own question, but a few hours of Googling for me the > > answer. You can use the tool unoconv to accomplish this task: > > > > unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o > test.md > > > > Hope this helps the next guy! > > That still seems weird -- are you sure you're using a pandoc version > that actually supports reading docx? > > What's the output of `pandoc -v`? > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFd4kYAczfd6tfWYJBiJm47itpRzZQ6fY-B3y7k252N1Q11SZQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 3694 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-12-31 22:34 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-12-31 22:34 UTF-8 error when converting Docx to Markdown Jesse Rosenthal -- strict thread matches above, loose matches on Subject: below -- 2014-12-31 8:54 Farhan [not found] ` <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2014-12-31 8:56 ` Farhan [not found] ` <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2014-12-31 9:54 ` Farhan [not found] ` <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2014-12-31 12:05 ` Jesse Rosenthal [not found] ` <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org> 2014-12-31 21:54 ` Farhan Khan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).