* UTF-8 error when converting Docx to Markdown
@ 2014-12-31 8:54 Farhan
[not found] ` <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Farhan @ 2014-12-31 8:54 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1.1: Type: text/plain, Size: 1199 bytes --]
Hi,
I can convert a markdown file to docx:
$ pandoc test.md -o test.docx
The resulting file "test.docx" opens just fine with both MS Word 2013 and
LibreOffice. However, when I attempt to convert that same resultant Docx
file back to markdown, I get an error:
$ pandoc test.docx -t markdown -o test.md
pandoc: Cannot decode byte '\x9f': Data.Text.Encoding.Fusion.streamUtf8:
Invalid UTF-8 stream
I get the same error when I try to convert Docx to any other format as
well, such as HTML. The issue seems to be that there is an invalid UTF-8
character. Is there a way to resolve this issue? Please let me know.
Thanks!
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9d171289-7a60-4ea0-907d-333e4cfba86e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 4646 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: UTF-8 error when converting Docx to Markdown
[not found] ` <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2014-12-31 8:56 ` Farhan
[not found] ` <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Farhan @ 2014-12-31 8:56 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1.1: Type: text/plain, Size: 1485 bytes --]
I forgot to mention, when I attempt to use iconv, as the documentation
suggests, I receive a similar error:
$ iconv -t utf-8 test.docx
P)Ficonv: illegal input sequence at position 12
On Wednesday, December 31, 2014 3:54:11 AM UTC-5, Farhan wrote:
>
> Hi,
>
> I can convert a markdown file to docx:
>
> $ pandoc test.md -o test.docx
>
> The resulting file "test.docx" opens just fine with both MS Word 2013 and
> LibreOffice. However, when I attempt to convert that same resultant Docx
> file back to markdown, I get an error:
>
> $ pandoc test.docx -t markdown -o test.md
> pandoc: Cannot decode byte '\x9f': Data.Text.Encoding.Fusion.streamUtf8:
> Invalid UTF-8 stream
>
> I get the same error when I try to convert Docx to any other format as
> well, such as HTML. The issue seems to be that there is an invalid UTF-8
> character. Is there a way to resolve this issue? Please let me know.
>
> Thanks!
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5305b92a-418f-44dc-87cc-8a42ae30fffd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 4752 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: UTF-8 error when converting Docx to Markdown
[not found] ` <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2014-12-31 9:54 ` Farhan
[not found] ` <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Farhan @ 2014-12-31 9:54 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1.1: Type: text/plain, Size: 1831 bytes --]
Sorry to answer my own question, but a few hours of Googling for me the
answer. You can use the tool unoconv to accomplish this task:
unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md
Hope this helps the next guy!
On Wednesday, December 31, 2014 3:56:51 AM UTC-5, Farhan wrote:
>
> I forgot to mention, when I attempt to use iconv, as the documentation
> suggests, I receive a similar error:
>
> $ iconv -t utf-8 test.docx
> P)Ficonv: illegal input sequence at position 12
>
> On Wednesday, December 31, 2014 3:54:11 AM UTC-5, Farhan wrote:
>>
>> Hi,
>>
>> I can convert a markdown file to docx:
>>
>> $ pandoc test.md -o test.docx
>>
>> The resulting file "test.docx" opens just fine with both MS Word 2013 and
>> LibreOffice. However, when I attempt to convert that same resultant Docx
>> file back to markdown, I get an error:
>>
>> $ pandoc test.docx -t markdown -o test.md
>> pandoc: Cannot decode byte '\x9f': Data.Text.Encoding.Fusion.streamUtf8:
>> Invalid UTF-8 stream
>>
>> I get the same error when I try to convert Docx to any other format as
>> well, such as HTML. The issue seems to be that there is an invalid UTF-8
>> character. Is there a way to resolve this issue? Please let me know.
>>
>> Thanks!
>>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/74660d48-9d88-4126-a34a-f815e542b4c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 6085 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: UTF-8 error when converting Docx to Markdown
[not found] ` <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2014-12-31 12:05 ` Jesse Rosenthal
[not found] ` <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Jesse Rosenthal @ 2014-12-31 12:05 UTC (permalink / raw)
To: Farhan, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Hi,
Farhan <khanzf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Sorry to answer my own question, but a few hours of Googling for me the
> answer. You can use the tool unoconv to accomplish this task:
>
> unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md
>
> Hope this helps the next guy!
That still seems weird -- are you sure you're using a pandoc version
that actually supports reading docx?
What's the output of `pandoc -v`?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: UTF-8 error when converting Docx to Markdown
[not found] ` <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org>
@ 2014-12-31 21:54 ` Farhan Khan
0 siblings, 0 replies; 6+ messages in thread
From: Farhan Khan @ 2014-12-31 21:54 UTC (permalink / raw)
To: Jesse Rosenthal; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 2458 bytes --]
My version information below:
I am using Ubuntu 14.04.
$ pandoc -v
pandoc 1.12.2.1
Compiled with texmath 0.6.5.2, highlighting-kate 0.5.5.1.
Syntax highlighting is supported for the following languages:
actionscript, ada, apache, asn1, asp, awk, bash, bibtex, boo, c,
changelog,
clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css, curry, d,
diff, djangotemplate, doxygen, doxygenlua, dtd, eiffel, email, erlang,
fortran, fsharp, gnuassembler, go, haskell, haxe, html, ini, java,
javadoc,
javascript, json, jsp, julia, latex, lex, literatecurry,
literatehaskell,
lua, makefile, mandoc, markdown, matlab, maxima, metafont, mips,
modelines,
modula2, modula3, monobasic, nasm, noweb, objectivec, objectivecpp,
ocaml,
octave, pascal, perl, php, pike, postscript, prolog, python, r,
relaxngcompact, rhtml, roff, ruby, rust, scala, scheme, sci, sed, sgml,
sql,
sqlmysql, sqlpostgresql, tcl, texinfo, verilog, vhdl, xml, xorg, xslt,
xul,
yacc, yaml
Default user data directory: /home/farhan/.pandoc
Copyright (C) 2006-2013 John MacFarlane
Web: http://johnmacfarlane.net/pandoc
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
On Wed, Dec 31, 2014 at 7:05 AM, Jesse Rosenthal <jrosenthal-4GNroTWusrE@public.gmane.org> wrote:
>
> Hi,
>
> Farhan <khanzf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Sorry to answer my own question, but a few hours of Googling for me the
> > answer. You can use the tool unoconv to accomplish this task:
> >
> > unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o
> test.md
> >
> > Hope this helps the next guy!
>
> That still seems weird -- are you sure you're using a pandoc version
> that actually supports reading docx?
>
> What's the output of `pandoc -v`?
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFd4kYAczfd6tfWYJBiJm47itpRzZQ6fY-B3y7k252N1Q11SZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: Type: text/html, Size: 3694 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: UTF-8 error when converting Docx to Markdown
@ 2014-12-31 22:34 Jesse Rosenthal
0 siblings, 0 replies; 6+ messages in thread
From: Jesse Rosenthal @ 2014-12-31 22:34 UTC (permalink / raw)
To: Farhan Khan; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/html, Size: 3998 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-12-31 22:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-31 8:54 UTF-8 error when converting Docx to Markdown Farhan
[not found] ` <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-12-31 8:56 ` Farhan
[not found] ` <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-12-31 9:54 ` Farhan
[not found] ` <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-12-31 12:05 ` Jesse Rosenthal
[not found] ` <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org>
2014-12-31 21:54 ` Farhan Khan
2014-12-31 22:34 Jesse Rosenthal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).