public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* can I convert a pdf to html or docx using pandoc?
@ 2014-02-26 14:59 Cifer Lee
       [not found] ` <4c35090f-98fc-45e6-93d4-a1c08b313c2f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Cifer Lee @ 2014-02-26 14:59 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 894 bytes --]

I have converted a markdown file to pdf. 

I want to know whether I can convert a pdf to html or markdown or docx. 

I do: 
  
     pandoc -o test.html test.pdf

but I got the following error: 

pandoc: test.pdf: hGetContents: invalid argument (invalid byte sequence)


so,  can pandoc convert pdf to other formats?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4c35090f-98fc-45e6-93d4-a1c08b313c2f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 1358 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: can I convert a pdf to html or docx using pandoc?
       [not found] ` <4c35090f-98fc-45e6-93d4-a1c08b313c2f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2014-02-26 15:32   ` Joost Kremers
       [not found]     ` <87ios1x53g.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Joost Kremers @ 2014-02-26 15:32 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Wed, Feb 26 2014, Cifer Lee <mantianyu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> so,  can pandoc convert pdf to other formats?

No, it can't. You can try pdftotext and pdftohtml, but keep in mind that
pdf is not a really document format but a rather page layout format, so
conversion can be problematic.

HTH

-- 
Joost Kremers
Life has its moments


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: can I convert a pdf to html or docx using pandoc?
       [not found]     ` <87ios1x53g.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
@ 2014-02-26 15:47       ` Cifer Lee
  2014-02-26 16:09       ` Dirk Laurie
  1 sibling, 0 replies; 4+ messages in thread
From: Cifer Lee @ 2014-02-26 15:47 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]

Thanks!

在 2014年2月26日星期三UTC+8下午11时32分03秒,Joost写道:
>
> On Wed, Feb 26 2014, Cifer Lee <mant...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> wrote: 
> > so,  can pandoc convert pdf to other formats? 
>
> No, it can't. You can try pdftotext and pdftohtml, but keep in mind that 
> pdf is not a really document format but a rather page layout format, so 
> conversion can be problematic. 
>
> HTH 
>
> -- 
> Joost Kremers 
> Life has its moments 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e6d7dfc7-67ef-4f1e-895d-af9ecc9ce468%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 1670 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: can I convert a pdf to html or docx using pandoc?
       [not found]     ` <87ios1x53g.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
  2014-02-26 15:47       ` Cifer Lee
@ 2014-02-26 16:09       ` Dirk Laurie
  1 sibling, 0 replies; 4+ messages in thread
From: Dirk Laurie @ 2014-02-26 16:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

2014-02-26 17:32 GMT+02:00 Joost Kremers <joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org>:

> On Wed, Feb 26 2014, Cifer Lee <mantianyu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> so,  can pandoc convert pdf to other formats?
>
> No, it can't. You can try pdftotext and pdftohtml, but keep in mind that
> pdf is not a really document format but a rather page layout format, so
> conversion can be problematic.

Those tools can produce HTML documents that look quite a lot
like the PDF, but they can't reproduce the HTML from which the
PDF was made. In particular, they treat all end-of-lines as hard.

Moreover, they use many HTML features that Pandoc does not
interpret, for example multi-file output, so putting their output into
Pandoc is disappointing.

The question is a bit like asking: can you recover a recipe from
tasting and chemically analyzing a stew?


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-26 16:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 14:59 can I convert a pdf to html or docx using pandoc? Cifer Lee
     [not found] ` <4c35090f-98fc-45e6-93d4-a1c08b313c2f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-02-26 15:32   ` Joost Kremers
     [not found]     ` <87ios1x53g.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2014-02-26 15:47       ` Cifer Lee
2014-02-26 16:09       ` Dirk Laurie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).