public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* error in conversion
@ 2012-11-27 19:19 Paulo Ney de Souza
       [not found] ` <50B51224.5080509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Paulo Ney de Souza @ 2012-11-27 19:19 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1087 bytes --]

  I have a file (A.doc, attached) that contains a single-character (the 
letter: A), that I can save as .txt or .html fine (using MSWord or Open 
Office) but that when I try converting with pandoc 1.9.4.2 I get the 
already known message:

pandoc: A.doc: hGetContents: invalid argument (invalid byte sequence)

I do believe there is some offending character deep in the bowels of 
metadata and what not of this file (it is 12Kb long)... but the problem 
is how does one go about locating it?

Will the development version help with solving this problem ? I know it 
has something in the way of identifying the character... but I have not 
been able to get it installed lately ...

Paulo Ney

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.



[-- Attachment #2: A.doc.doc --]
[-- Type: application/msword, Size: 12288 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: error in conversion
       [not found] ` <50B51224.5080509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-11-27 19:45   ` Joost Kremers
       [not found]     ` <87txsan83j.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Joost Kremers @ 2012-11-27 19:45 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Tue, Nov 27 2012, Paulo Ney de Souza <pauloney-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>   I have a file (A.doc, attached) that contains a single-character (the 
> letter: A), that I can save as .txt or .html fine (using MSWord or Open 
> Office) but that when I try converting with pandoc 1.9.4.2 I get the 
> already known message:

to the best of my knowledge, pandoc doesn't support .doc as an input
format. (or even as an output format, for that matter: it can output
.docx, but not .doc.)

if you want to convert a .doc file to text, you can try antiword, which
IME does a fairly good job.

-- 
Joost Kremers
Life has its moments


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: error in conversion
       [not found]     ` <87txsan83j.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
@ 2012-11-27 20:26       ` Paulo Ney de Souza
       [not found]         ` <50B521FC.3060606-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Paulo Ney de Souza @ 2012-11-27 20:26 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

  Dear Joost,

I am sorry, I meant to write "docx" and include this version of the 
file. Everything I said continues is the same. The file contains only 
one letter (A.docx attached), that I can save as .txt or .html fine 
(using MSWord or Open
Office) but that when I try converting with pandoc 1.9.4.2 I get the 
already known message ....

   hGetContents: invalid argument (invalid byte sequence)

What I want to do is to be able to go from DocX to HTML, TeX, ... but it 
seems that small changes in the DocX file can render the conversion 
impossible.

Paulo Ney


On 11/27/2012 11:45 AM, Joost Kremers wrote:
> On Tue, Nov 27 2012, Paulo Ney de Souza<pauloney-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>  wrote:
>>    I have a file (A.doc, attached) that contains a single-character (the
>> letter: A), that I can save as .txt or .html fine (using MSWord or Open
>> Office) but that when I try converting with pandoc 1.9.4.2 I get the
>> already known message:
> to the best of my knowledge, pandoc doesn't support .doc as an input
> format. (or even as an output format, for that matter: it can output
> .docx, but not .doc.)
>
> if you want to convert a .doc file to text, you can try antiword, which
> IME does a fairly good job.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.



[-- Attachment #2: A.docx --]
[-- Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document, Size: 17650 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: error in conversion
       [not found]         ` <50B521FC.3060606-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-11-27 21:17           ` John MacFarlane
  0 siblings, 0 replies; 4+ messages in thread
From: John MacFarlane @ 2012-11-27 21:17 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Pandoc can produce docx, but not read docx.

+++ Paulo Ney de Souza [Nov 27 12 12:26 ]:
>  Dear Joost,
> 
> I am sorry, I meant to write "docx" and include this version of the
> file. Everything I said continues is the same. The file contains
> only one letter (A.docx attached), that I can save as .txt or .html
> fine (using MSWord or Open
> Office) but that when I try converting with pandoc 1.9.4.2 I get the
> already known message ....
> 
>   hGetContents: invalid argument (invalid byte sequence)
> 
> What I want to do is to be able to go from DocX to HTML, TeX, ...
> but it seems that small changes in the DocX file can render the
> conversion impossible.
> 
> Paulo Ney
> 
> 
> On 11/27/2012 11:45 AM, Joost Kremers wrote:
> >On Tue, Nov 27 2012, Paulo Ney de Souza<pauloney-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>  wrote:
> >>   I have a file (A.doc, attached) that contains a single-character (the
> >>letter: A), that I can save as .txt or .html fine (using MSWord or Open
> >>Office) but that when I try converting with pandoc 1.9.4.2 I get the
> >>already known message:
> >to the best of my knowledge, pandoc doesn't support .doc as an input
> >format. (or even as an output format, for that matter: it can output
> >.docx, but not .doc.)
> >
> >if you want to convert a .doc file to text, you can try antiword, which
> >IME does a fairly good job.
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-11-27 21:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-27 19:19 error in conversion Paulo Ney de Souza
     [not found] ` <50B51224.5080509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-11-27 19:45   ` Joost Kremers
     [not found]     ` <87txsan83j.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2012-11-27 20:26       ` Paulo Ney de Souza
     [not found]         ` <50B521FC.3060606-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-11-27 21:17           ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).