public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Re: UTF-8 error when converting Docx to Markdown
@ 2014-12-31 22:34 Jesse Rosenthal
  0 siblings, 0 replies; 6+ messages in thread
From: Jesse Rosenthal @ 2014-12-31 22:34 UTC (permalink / raw)
  To: Farhan Khan; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/html, Size: 3998 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 error when converting Docx to Markdown
       [not found]             ` <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org>
@ 2014-12-31 21:54               ` Farhan Khan
  0 siblings, 0 replies; 6+ messages in thread
From: Farhan Khan @ 2014-12-31 21:54 UTC (permalink / raw)
  To: Jesse Rosenthal; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2458 bytes --]

My version information below:

I am using Ubuntu 14.04.

$ pandoc -v
pandoc 1.12.2.1
Compiled with texmath 0.6.5.2, highlighting-kate 0.5.5.1.
Syntax highlighting is supported for the following languages:
    actionscript, ada, apache, asn1, asp, awk, bash, bibtex, boo, c,
changelog,
    clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css, curry, d,
    diff, djangotemplate, doxygen, doxygenlua, dtd, eiffel, email, erlang,
    fortran, fsharp, gnuassembler, go, haskell, haxe, html, ini, java,
javadoc,
    javascript, json, jsp, julia, latex, lex, literatecurry,
literatehaskell,
    lua, makefile, mandoc, markdown, matlab, maxima, metafont, mips,
modelines,
    modula2, modula3, monobasic, nasm, noweb, objectivec, objectivecpp,
ocaml,
    octave, pascal, perl, php, pike, postscript, prolog, python, r,
    relaxngcompact, rhtml, roff, ruby, rust, scala, scheme, sci, sed, sgml,
sql,
    sqlmysql, sqlpostgresql, tcl, texinfo, verilog, vhdl, xml, xorg, xslt,
xul,
    yacc, yaml
Default user data directory: /home/farhan/.pandoc
Copyright (C) 2006-2013 John MacFarlane
Web:  http://johnmacfarlane.net/pandoc
This is free software; see the source for copying conditions.  There is no
warranty, not even for merchantability or fitness for a particular purpose.


On Wed, Dec 31, 2014 at 7:05 AM, Jesse Rosenthal <jrosenthal-4GNroTWusrE@public.gmane.org> wrote:

>
> Hi,
>
> Farhan <khanzf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Sorry to answer my own question, but a few hours of Googling for me the
> > answer. You can use the tool unoconv to accomplish this task:
> >
> > unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o
> test.md
> >
> > Hope this helps the next guy!
>
> That still seems weird -- are you sure you're using a pandoc version
> that actually supports reading docx?
>
> What's the output of `pandoc -v`?
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFd4kYAczfd6tfWYJBiJm47itpRzZQ6fY-B3y7k252N1Q11SZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 3694 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 error when converting Docx to Markdown
       [not found]         ` <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2014-12-31 12:05           ` Jesse Rosenthal
       [not found]             ` <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Jesse Rosenthal @ 2014-12-31 12:05 UTC (permalink / raw)
  To: Farhan, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


Hi,

Farhan <khanzf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Sorry to answer my own question, but a few hours of Googling for me the 
> answer. You can use the tool unoconv to accomplish this task:
>
> unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md
>
> Hope this helps the next guy!

That still seems weird -- are you sure you're using a pandoc version
that actually supports reading docx?

What's the output of `pandoc -v`?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 error when converting Docx to Markdown
       [not found]     ` <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2014-12-31  9:54       ` Farhan
       [not found]         ` <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Farhan @ 2014-12-31  9:54 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 1831 bytes --]

Sorry to answer my own question, but a few hours of Googling for me the 
answer. You can use the tool unoconv to accomplish this task:

unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md

Hope this helps the next guy!



On Wednesday, December 31, 2014 3:56:51 AM UTC-5, Farhan wrote:
>
> I forgot to mention, when I attempt to use iconv, as the documentation 
> suggests, I receive a similar error:
>
> $ iconv -t utf-8 test.docx
> P)Ficonv: illegal input sequence at position 12
>
> On Wednesday, December 31, 2014 3:54:11 AM UTC-5, Farhan wrote:
>>
>> Hi,
>>
>> I can convert a markdown file to docx:
>>
>> $ pandoc test.md -o test.docx
>>
>> The resulting file "test.docx" opens just fine with both MS Word 2013 and 
>> LibreOffice. However, when I attempt to convert that same resultant Docx 
>> file back to markdown, I get an error:
>>
>> $ pandoc test.docx -t markdown -o test.md
>> pandoc: Cannot decode byte '\x9f': Data.Text.Encoding.Fusion.streamUtf8: 
>> Invalid UTF-8 stream
>>
>> I get the same error when I try to convert Docx to any other format as 
>> well, such as HTML. The issue seems to be that there is an invalid UTF-8 
>> character. Is there a way to resolve this issue? Please let me know.
>>
>> Thanks!
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/74660d48-9d88-4126-a34a-f815e542b4c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 6085 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 error when converting Docx to Markdown
       [not found] ` <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2014-12-31  8:56   ` Farhan
       [not found]     ` <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Farhan @ 2014-12-31  8:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 1485 bytes --]

I forgot to mention, when I attempt to use iconv, as the documentation 
suggests, I receive a similar error:

$ iconv -t utf-8 test.docx
P)Ficonv: illegal input sequence at position 12

On Wednesday, December 31, 2014 3:54:11 AM UTC-5, Farhan wrote:
>
> Hi,
>
> I can convert a markdown file to docx:
>
> $ pandoc test.md -o test.docx
>
> The resulting file "test.docx" opens just fine with both MS Word 2013 and 
> LibreOffice. However, when I attempt to convert that same resultant Docx 
> file back to markdown, I get an error:
>
> $ pandoc test.docx -t markdown -o test.md
> pandoc: Cannot decode byte '\x9f': Data.Text.Encoding.Fusion.streamUtf8: 
> Invalid UTF-8 stream
>
> I get the same error when I try to convert Docx to any other format as 
> well, such as HTML. The issue seems to be that there is an invalid UTF-8 
> character. Is there a way to resolve this issue? Please let me know.
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5305b92a-418f-44dc-87cc-8a42ae30fffd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4752 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* UTF-8 error when converting Docx to Markdown
@ 2014-12-31  8:54 Farhan
       [not found] ` <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Farhan @ 2014-12-31  8:54 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 1199 bytes --]

Hi,

I can convert a markdown file to docx:

$ pandoc test.md -o test.docx

The resulting file "test.docx" opens just fine with both MS Word 2013 and 
LibreOffice. However, when I attempt to convert that same resultant Docx 
file back to markdown, I get an error:

$ pandoc test.docx -t markdown -o test.md
pandoc: Cannot decode byte '\x9f': Data.Text.Encoding.Fusion.streamUtf8: 
Invalid UTF-8 stream

I get the same error when I try to convert Docx to any other format as 
well, such as HTML. The issue seems to be that there is an invalid UTF-8 
character. Is there a way to resolve this issue? Please let me know.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9d171289-7a60-4ea0-907d-333e4cfba86e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4646 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-12-31 22:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-31 22:34 UTF-8 error when converting Docx to Markdown Jesse Rosenthal
  -- strict thread matches above, loose matches on Subject: below --
2014-12-31  8:54 Farhan
     [not found] ` <9d171289-7a60-4ea0-907d-333e4cfba86e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-12-31  8:56   ` Farhan
     [not found]     ` <5305b92a-418f-44dc-87cc-8a42ae30fffd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-12-31  9:54       ` Farhan
     [not found]         ` <74660d48-9d88-4126-a34a-f815e542b4c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-12-31 12:05           ` Jesse Rosenthal
     [not found]             ` <m1tx0crpmx.fsf-4GNroTWusrE@public.gmane.org>
2014-12-31 21:54               ` Farhan Khan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).