* can I convert a pdf to html or docx using pandoc?
@ 2014-02-26 14:59 Cifer Lee
[not found] ` <4c35090f-98fc-45e6-93d4-a1c08b313c2f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Cifer Lee @ 2014-02-26 14:59 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 894 bytes --]
I have converted a markdown file to pdf.
I want to know whether I can convert a pdf to html or markdown or docx.
I do:
pandoc -o test.html test.pdf
but I got the following error:
pandoc: test.pdf: hGetContents: invalid argument (invalid byte sequence)
so, can pandoc convert pdf to other formats?
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4c35090f-98fc-45e6-93d4-a1c08b313c2f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
[-- Attachment #2: Type: text/html, Size: 1358 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: can I convert a pdf to html or docx using pandoc?
[not found] ` <4c35090f-98fc-45e6-93d4-a1c08b313c2f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2014-02-26 15:32 ` Joost Kremers
[not found] ` <87ios1x53g.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Joost Kremers @ 2014-02-26 15:32 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
On Wed, Feb 26 2014, Cifer Lee <mantianyu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> so, can pandoc convert pdf to other formats?
No, it can't. You can try pdftotext and pdftohtml, but keep in mind that
pdf is not a really document format but a rather page layout format, so
conversion can be problematic.
HTH
--
Joost Kremers
Life has its moments
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: can I convert a pdf to html or docx using pandoc?
[not found] ` <87ios1x53g.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
@ 2014-02-26 15:47 ` Cifer Lee
2014-02-26 16:09 ` Dirk Laurie
1 sibling, 0 replies; 4+ messages in thread
From: Cifer Lee @ 2014-02-26 15:47 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]
Thanks!
在 2014年2月26日星期三UTC+8下午11时32分03秒,Joost写道:
>
> On Wed, Feb 26 2014, Cifer Lee <mant...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> wrote:
> > so, can pandoc convert pdf to other formats?
>
> No, it can't. You can try pdftotext and pdftohtml, but keep in mind that
> pdf is not a really document format but a rather page layout format, so
> conversion can be problematic.
>
> HTH
>
> --
> Joost Kremers
> Life has its moments
>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e6d7dfc7-67ef-4f1e-895d-af9ecc9ce468%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
[-- Attachment #2: Type: text/html, Size: 1670 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: can I convert a pdf to html or docx using pandoc?
[not found] ` <87ios1x53g.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2014-02-26 15:47 ` Cifer Lee
@ 2014-02-26 16:09 ` Dirk Laurie
1 sibling, 0 replies; 4+ messages in thread
From: Dirk Laurie @ 2014-02-26 16:09 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
2014-02-26 17:32 GMT+02:00 Joost Kremers <joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org>:
> On Wed, Feb 26 2014, Cifer Lee <mantianyu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> so, can pandoc convert pdf to other formats?
>
> No, it can't. You can try pdftotext and pdftohtml, but keep in mind that
> pdf is not a really document format but a rather page layout format, so
> conversion can be problematic.
Those tools can produce HTML documents that look quite a lot
like the PDF, but they can't reproduce the HTML from which the
PDF was made. In particular, they treat all end-of-lines as hard.
Moreover, they use many HTML features that Pandoc does not
interpret, for example multi-file output, so putting their output into
Pandoc is disappointing.
The question is a bit like asking: can you recover a recipe from
tasting and chemically analyzing a stew?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-02-26 16:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 14:59 can I convert a pdf to html or docx using pandoc? Cifer Lee
[not found] ` <4c35090f-98fc-45e6-93d4-a1c08b313c2f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-02-26 15:32 ` Joost Kremers
[not found] ` <87ios1x53g.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2014-02-26 15:47 ` Cifer Lee
2014-02-26 16:09 ` Dirk Laurie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).