HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
@ 2020-04-21  0:29 Heck Lennon
       [not found] ` <cfd086c1-9fe5-41bd-b735-3cd8db7579d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Heck Lennon @ 2020-04-21  0:29 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1131 bytes --]



Hello


On Windows (7, 32 bits), I'm trying to convert a ~450 page PDF into EPUB.


1. I used "mutool draw" to convert the PDF into a single, ~10MB HTML:


pandoc -f html -t epub3 -o output.epub input.html

(~10mn wait on my sluggish computer)

"Out of memory":


2. Next, I reran "mutool draw" to convert the PDF as one page = one HTML 
page:


pandoc -o output.epub  *.html

pandoc: *.html: openBinaryFile: invalid argument (Invalid argument) 


3.Finally, I used pandoc to concatenate all the HTML files, but still got a 
"openBinaryFile: invalid argument (Invalid argument)".


pandoc *.html > full.html

pandoc: *.html: openBinaryFile: invalid argument (Invalid argument)


What do you suggest I try?


Thank you.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/cfd086c1-9fe5-41bd-b735-3cd8db7579d9%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1647 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found] ` <cfd086c1-9fe5-41bd-b735-3cd8db7579d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-21  5:40   ` John MacFarlane
       [not found]     ` <m2d081o0qc.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  2020-04-21 23:25   ` Kolen Cheung
  1 sibling, 1 reply; 14+ messages in thread
From: John MacFarlane @ 2020-04-21  5:40 UTC (permalink / raw)
  To: Heck Lennon, pandoc-discuss


That's extremely strange.  Your shell should be expanding the *
in *.html before it even gets to pandoc.  So if pandoc can see
the *, your shell hasn't done what it's supposed to.

What OS are you using, and what version of pandoc?

Heck Lennon <frdtheman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hello
>
>
> On Windows (7, 32 bits), I'm trying to convert a ~450 page PDF into EPUB.
>
>
> 1. I used "mutool draw" to convert the PDF into a single, ~10MB HTML:
>
>
> pandoc -f html -t epub3 -o output.epub input.html
>
> (~10mn wait on my sluggish computer)
>
> "Out of memory":
>
>
> 2. Next, I reran "mutool draw" to convert the PDF as one page = one HTML 
> page:
>
>
> pandoc -o output.epub  *.html
>
> pandoc: *.html: openBinaryFile: invalid argument (Invalid argument) 
>
>
> 3.Finally, I used pandoc to concatenate all the HTML files, but still got a 
> "openBinaryFile: invalid argument (Invalid argument)".
>
>
> pandoc *.html > full.html
>
> pandoc: *.html: openBinaryFile: invalid argument (Invalid argument)
>
>
> What do you suggest I try?
>
>
> Thank you.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/cfd086c1-9fe5-41bd-b735-3cd8db7579d9%40googlegroups.com.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]     ` <m2d081o0qc.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2020-04-21 10:10       ` Heck Lennon
       [not found]         ` <65ccb50b-6595-450d-86ca-c8103867e3bf-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Heck Lennon @ 2020-04-21 10:10 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2256 bytes --]

It's Windows (7, 32 bits) and pandoc 2.9.2.1.

Le mardi 21 avril 2020 07:40:45 UTC+2, John MacFarlane a écrit :
>
>
> That's extremely strange.  Your shell should be expanding the * 
> in *.html before it even gets to pandoc.  So if pandoc can see 
> the *, your shell hasn't done what it's supposed to. 
>
> What OS are you using, and what version of pandoc? 
>
> Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > Hello 
> > 
> > 
> > On Windows (7, 32 bits), I'm trying to convert a ~450 page PDF into 
> EPUB. 
> > 
> > 
> > 1. I used "mutool draw" to convert the PDF into a single, ~10MB HTML: 
> > 
> > 
> > pandoc -f html -t epub3 -o output.epub input.html 
> > 
> > (~10mn wait on my sluggish computer) 
> > 
> > "Out of memory": 
> > 
> > 
> > 2. Next, I reran "mutool draw" to convert the PDF as one page = one HTML 
> > page: 
> > 
> > 
> > pandoc -o output.epub  *.html 
> > 
> > pandoc: *.html: openBinaryFile: invalid argument (Invalid argument) 
> > 
> > 
> > 3.Finally, I used pandoc to concatenate all the HTML files, but still 
> got a 
> > "openBinaryFile: invalid argument (Invalid argument)". 
> > 
> > 
> > pandoc *.html > full.html 
> > 
> > pandoc: *.html: openBinaryFile: invalid argument (Invalid argument) 
> > 
> > 
> > What do you suggest I try? 
> > 
> > 
> > Thank you. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/cfd086c1-9fe5-41bd-b735-3cd8db7579d9%40googlegroups.com. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/65ccb50b-6595-450d-86ca-c8103867e3bf%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3771 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]         ` <65ccb50b-6595-450d-86ca-c8103867e3bf-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-21 10:52           ` Heck Lennon
       [not found]             ` <f11a136c-0f32-4a59-b7cf-4aab865e1d68-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Heck Lennon @ 2020-04-21 10:52 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2765 bytes --]

Per this thread…
https://groups.google.com/d/msg/pandoc-discuss/eMfCGU3Gn8E/bEhyLpUYBAAJ
… I named the batch file pandoc.cmd, and re-ran the command thusly:

echo output.epub | pandoc *.html -

It runs for a few minutes, and ends with displaying some HTML… but no .epub 
can be found.

I assume I'm not using the command correctly. Can pandoc use the standard 
input?


Le mardi 21 avril 2020 12:10:30 UTC+2, Heck Lennon a écrit :
>
> It's Windows (7, 32 bits) and pandoc 2.9.2.1.
>
> Le mardi 21 avril 2020 07:40:45 UTC+2, John MacFarlane a écrit :
>>
>>
>> That's extremely strange.  Your shell should be expanding the * 
>> in *.html before it even gets to pandoc.  So if pandoc can see 
>> the *, your shell hasn't done what it's supposed to. 
>>
>> What OS are you using, and what version of pandoc? 
>>
>> Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: 
>>
>> > Hello 
>> > 
>> > 
>> > On Windows (7, 32 bits), I'm trying to convert a ~450 page PDF into 
>> EPUB. 
>> > 
>> > 
>> > 1. I used "mutool draw" to convert the PDF into a single, ~10MB HTML: 
>> > 
>> > 
>> > pandoc -f html -t epub3 -o output.epub input.html 
>> > 
>> > (~10mn wait on my sluggish computer) 
>> > 
>> > "Out of memory": 
>> > 
>> > 
>> > 2. Next, I reran "mutool draw" to convert the PDF as one page = one 
>> HTML 
>> > page: 
>> > 
>> > 
>> > pandoc -o output.epub  *.html 
>> > 
>> > pandoc: *.html: openBinaryFile: invalid argument (Invalid argument) 
>> > 
>> > 
>> > 3.Finally, I used pandoc to concatenate all the HTML files, but still 
>> got a 
>> > "openBinaryFile: invalid argument (Invalid argument)". 
>> > 
>> > 
>> > pandoc *.html > full.html 
>> > 
>> > pandoc: *.html: openBinaryFile: invalid argument (Invalid argument) 
>> > 
>> > 
>> > What do you suggest I try? 
>> > 
>> > 
>> > Thank you. 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org 
>> > To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/cfd086c1-9fe5-41bd-b735-3cd8db7579d9%40googlegroups.com. 
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f11a136c-0f32-4a59-b7cf-4aab865e1d68%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4114 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]             ` <f11a136c-0f32-4a59-b7cf-4aab865e1d68-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-21 18:21               ` John MacFarlane
       [not found]                 ` <m2368wog2l.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: John MacFarlane @ 2020-04-21 18:21 UTC (permalink / raw)
  To: Heck Lennon, pandoc-discuss


Yes, pandoc can use stdin (see the manual), but only when
files aren't explicitly specified on the command line.  You
can't use stdin AND name input files, as in your batch file.

I have no idea why you wouldn't be getting shell expansion
of *.html; maybe someone who uses Windows could comment.
Are there perhaps special characters or spaces in your
.html file names?  Try

pandoc "*.html"

Heck Lennon <frdtheman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Per this thread…
> https://groups.google.com/d/msg/pandoc-discuss/eMfCGU3Gn8E/bEhyLpUYBAAJ
> … I named the batch file pandoc.cmd, and re-ran the command thusly:
>
> echo output.epub | pandoc *.html -
>
> It runs for a few minutes, and ends with displaying some HTML… but no .epub 
> can be found.
>
> I assume I'm not using the command correctly. Can pandoc use the standard 
> input?
>
>
> Le mardi 21 avril 2020 12:10:30 UTC+2, Heck Lennon a écrit :
>>
>> It's Windows (7, 32 bits) and pandoc 2.9.2.1.
>>
>> Le mardi 21 avril 2020 07:40:45 UTC+2, John MacFarlane a écrit :
>>>
>>>
>>> That's extremely strange.  Your shell should be expanding the * 
>>> in *.html before it even gets to pandoc.  So if pandoc can see 
>>> the *, your shell hasn't done what it's supposed to. 
>>>
>>> What OS are you using, and what version of pandoc? 
>>>
>>> Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: 
>>>
>>> > Hello 
>>> > 
>>> > 
>>> > On Windows (7, 32 bits), I'm trying to convert a ~450 page PDF into 
>>> EPUB. 
>>> > 
>>> > 
>>> > 1. I used "mutool draw" to convert the PDF into a single, ~10MB HTML: 
>>> > 
>>> > 
>>> > pandoc -f html -t epub3 -o output.epub input.html 
>>> > 
>>> > (~10mn wait on my sluggish computer) 
>>> > 
>>> > "Out of memory": 
>>> > 
>>> > 
>>> > 2. Next, I reran "mutool draw" to convert the PDF as one page = one 
>>> HTML 
>>> > page: 
>>> > 
>>> > 
>>> > pandoc -o output.epub  *.html 
>>> > 
>>> > pandoc: *.html: openBinaryFile: invalid argument (Invalid argument) 
>>> > 
>>> > 
>>> > 3.Finally, I used pandoc to concatenate all the HTML files, but still 
>>> got a 
>>> > "openBinaryFile: invalid argument (Invalid argument)". 
>>> > 
>>> > 
>>> > pandoc *.html > full.html 
>>> > 
>>> > pandoc: *.html: openBinaryFile: invalid argument (Invalid argument) 
>>> > 
>>> > 
>>> > What do you suggest I try? 
>>> > 
>>> > 
>>> > Thank you. 
>>> > 
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> Groups "pandoc-discuss" group. 
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org 
>>> > To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/pandoc-discuss/cfd086c1-9fe5-41bd-b735-3cd8db7579d9%40googlegroups.com. 
>>>
>>>
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f11a136c-0f32-4a59-b7cf-4aab865e1d68%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2368wog2l.fsf%40johnmacfarlane.net.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]                 ` <m2368wog2l.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2020-04-21 19:40                   ` Anders Eriksson DC
  0 siblings, 0 replies; 14+ messages in thread
From: Anders Eriksson DC @ 2020-04-21 19:40 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw



On 2020-04-21 20:21, John MacFarlane wrote:
> Yes, pandoc can use stdin (see the manual), but only when
> files aren't explicitly specified on the command line.  You
> can't use stdin AND name input files, as in your batch file.
>
> I have no idea why you wouldn't be getting shell expansion
> of *.html; maybe someone who uses Windows could comment.
> Are there perhaps special characters or spaces in your
> .html file names?  Try
>
> pandoc "*.html"
Windows doesn't support shell expansion!
You need to create a batch file or run in PowerShell ...
More info here
https://superuser.com/questions/460598/is-there-any-way-to-get-the-windows-cmd-shell-to-expand-wildcard-paths

// Anders
> Heck Lennon <frdtheman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Per this thread…
>> https://groups.google.com/d/msg/pandoc-discuss/eMfCGU3Gn8E/bEhyLpUYBAAJ
>> … I named the batch file pandoc.cmd, and re-ran the command thusly:
>>
>> echo output.epub | pandoc *.html -
>>
>> It runs for a few minutes, and ends with displaying some HTML… but no .epub
>> can be found.
>>
>> I assume I'm not using the command correctly. Can pandoc use the standard
>> input?
>>
>>
>> Le mardi 21 avril 2020 12:10:30 UTC+2, Heck Lennon a écrit :
>>> It's Windows (7, 32 bits) and pandoc 2.9.2.1.
>>>
>>> Le mardi 21 avril 2020 07:40:45 UTC+2, John MacFarlane a écrit :
>>>>
>>>> That's extremely strange.  Your shell should be expanding the *
>>>> in *.html before it even gets to pandoc.  So if pandoc can see
>>>> the *, your shell hasn't done what it's supposed to.
>>>>
>>>> What OS are you using, and what version of pandoc?
>>>>
>>>> Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>>
>>>>> Hello
>>>>>
>>>>>
>>>>> On Windows (7, 32 bits), I'm trying to convert a ~450 page PDF into
>>>> EPUB.
>>>>>
>>>>> 1. I used "mutool draw" to convert the PDF into a single, ~10MB HTML:
>>>>>
>>>>>
>>>>> pandoc -f html -t epub3 -o output.epub input.html
>>>>>
>>>>> (~10mn wait on my sluggish computer)
>>>>>
>>>>> "Out of memory":
>>>>>
>>>>>
>>>>> 2. Next, I reran "mutool draw" to convert the PDF as one page = one
>>>> HTML
>>>>> page:
>>>>>
>>>>>
>>>>> pandoc -o output.epub  *.html
>>>>>
>>>>> pandoc: *.html: openBinaryFile: invalid argument (Invalid argument)
>>>>>
>>>>>
>>>>> 3.Finally, I used pandoc to concatenate all the HTML files, but still
>>>> got a
>>>>> "openBinaryFile: invalid argument (Invalid argument)".
>>>>>
>>>>>
>>>>> pandoc *.html > full.html
>>>>>
>>>>> pandoc: *.html: openBinaryFile: invalid argument (Invalid argument)
>>>>>
>>>>>
>>>>> What do you suggest I try?
>>>>>
>>>>>
>>>>> Thank you.
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google
>>>> Groups "pandoc-discuss" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/pandoc-discuss/cfd086c1-9fe5-41bd-b735-3cd8db7579d9%40googlegroups.com.
>>>>
>>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f11a136c-0f32-4a59-b7cf-4aab865e1d68%40googlegroups.com.

-- 
Anders Eriksson
Software Engineer
DC Lasersystem AB
Norsborsgsvägen 3
S-145 90  NORSBORG
SWEDEN
+46 (0)73 029 45 74  (Mobile)
anders-gxScWWR+X5axBppYLAEN5wC/G2K4zDHf@public.gmane.org

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e592c58b-5076-51aa-e09b-6bb3b31666d7%40dclasersystem.com.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found] ` <cfd086c1-9fe5-41bd-b735-3cd8db7579d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2020-04-21  5:40   ` John MacFarlane
@ 2020-04-21 23:25   ` Kolen Cheung
       [not found]     ` <879425ff-d491-4d0b-8ffe-db24ad9cce23-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Kolen Cheung @ 2020-04-21 23:25 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 1297 bytes --]

A side note, since your goal is to convert from PDF to ePub, you probably will have better results using other tools. Eg I know it can be converted to docx, and then from docx to ePub. There may he tool that can help you convert that directly too. Essentially for the tools you choose, you’d want to choose one preserving most information. And since pandoc focuses many on the structure of the document, much other information would be lost. The choice of tool also depends on which ones you’re comfortable with, Eg the PDF to docx I mentioned probably can be done by Adobe Acrobat and MS Word. But they are proprietary and difficult to run from the command line.

In your case, since you have a tool preconverted them to html already, html to ePub can be done better by some other engines (since the 2 are closely related.) may be you can try Calibre which also have a cli.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/879425ff-d491-4d0b-8ffe-db24ad9cce23%40googlegroups.com.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]     ` <879425ff-d491-4d0b-8ffe-db24ad9cce23-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-22  9:55       ` Heck Lennon
       [not found]         ` <14c0eaf0-b920-477c-a735-dded7f1df0c5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Heck Lennon @ 2020-04-22  9:55 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1447 bytes --]

Thanks everyone for the infos!

Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a écrit :
>
> A side note, since your goal is to convert from PDF to ePub, you probably 
> will have better results using other tools. Eg I know it can be converted 
> to docx, and then from docx to ePub. There may he tool that can help you 
> convert that directly too. Essentially for the tools you choose, you’d want 
> to choose one preserving most information. And since pandoc focuses many on 
> the structure of the document, much other information would be lost. The 
> choice of tool also depends on which ones you’re comfortable with, Eg the 
> PDF to docx I mentioned probably can be done by Adobe Acrobat and MS Word. 
> But they are proprietary and difficult to run from the command line. 
>
> In your case, since you have a tool preconverted them to html already, 
> html to ePub can be done better by some other engines (since the 2 are 
> closely related.) may be you can try Calibre which also have a cli.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/14c0eaf0-b920-477c-a735-dded7f1df0c5%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1837 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]         ` <14c0eaf0-b920-477c-a735-dded7f1df0c5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-22 12:30           ` Heck Lennon
       [not found]             ` <b3218bbb-9846-4e52-b201-7e4a1b8b09d6-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Heck Lennon @ 2020-04-22 12:30 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2059 bytes --]

Since I had a Linux host available, I went around that issue with Windows 
and shell expansion.

pandoc -f html -t epub3 -o output.epub input.html


pandoc ran successfully (no error message), but the EPUB can't be opened in 
a Windows GUI application that supports EPUB files ("Error loading 
file.epub"). Likewise, I can't open the file after changing its extension 
from EPUB to ZIP.

Here's the input files (HTML + PNGs):

https://we.tl/t-5EeGXML1rb

Do I need extra options in the command line?

Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a écrit :
>
> Thanks everyone for the infos!
>
> Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a écrit :
>>
>> A side note, since your goal is to convert from PDF to ePub, you probably 
>> will have better results using other tools. Eg I know it can be converted 
>> to docx, and then from docx to ePub. There may he tool that can help you 
>> convert that directly too. Essentially for the tools you choose, you’d want 
>> to choose one preserving most information. And since pandoc focuses many on 
>> the structure of the document, much other information would be lost. The 
>> choice of tool also depends on which ones you’re comfortable with, Eg the 
>> PDF to docx I mentioned probably can be done by Adobe Acrobat and MS Word. 
>> But they are proprietary and difficult to run from the command line. 
>>
>> In your case, since you have a tool preconverted them to html already, 
>> html to ePub can be done better by some other engines (since the 2 are 
>> closely related.) may be you can try Calibre which also have a cli.
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8b09d6%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2717 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]             ` <b3218bbb-9846-4e52-b201-7e4a1b8b09d6-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-22 15:58               ` John MacFarlane
       [not found]                 ` <m2tv1bfr6q.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: John MacFarlane @ 2020-04-22 15:58 UTC (permalink / raw)
  To: Heck Lennon, pandoc-discuss


What pandoc version are you running on the linux box?
This works fine for me.


Heck Lennon <frdtheman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Since I had a Linux host available, I went around that issue with Windows 
> and shell expansion.
>
> pandoc -f html -t epub3 -o output.epub input.html
>
>
> pandoc ran successfully (no error message), but the EPUB can't be opened in 
> a Windows GUI application that supports EPUB files ("Error loading 
> file.epub"). Likewise, I can't open the file after changing its extension 
> from EPUB to ZIP.
>
> Here's the input files (HTML + PNGs):
>
> https://we.tl/t-5EeGXML1rb
>
> Do I need extra options in the command line?
>
> Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a écrit :
>>
>> Thanks everyone for the infos!
>>
>> Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a écrit :
>>>
>>> A side note, since your goal is to convert from PDF to ePub, you probably 
>>> will have better results using other tools. Eg I know it can be converted 
>>> to docx, and then from docx to ePub. There may he tool that can help you 
>>> convert that directly too. Essentially for the tools you choose, you’d want 
>>> to choose one preserving most information. And since pandoc focuses many on 
>>> the structure of the document, much other information would be lost. The 
>>> choice of tool also depends on which ones you’re comfortable with, Eg the 
>>> PDF to docx I mentioned probably can be done by Adobe Acrobat and MS Word. 
>>> But they are proprietary and difficult to run from the command line. 
>>>
>>> In your case, since you have a tool preconverted them to html already, 
>>> html to ePub can be done better by some other engines (since the 2 are 
>>> closely related.) may be you can try Calibre which also have a cli.
>>
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8b09d6%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2tv1bfr6q.fsf%40johnmacfarlane.net.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]                 ` <m2tv1bfr6q.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2020-04-22 21:59                   ` Heck Lennon
       [not found]                     ` <026f695e-0849-4c01-969b-0c2ccbeb31b9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Heck Lennon @ 2020-04-22 21:59 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3114 bytes --]

pandoc 2.5.2 on Ubuntu 19.10.

Turns out I had to use "-t epub" instead of "-t epub3" :

pandoc -f html -t epub -o output.epub input.html

Thank you.

Le mercredi 22 avril 2020 17:58:39 UTC+2, John MacFarlane a écrit :
>
>
> What pandoc version are you running on the linux box? 
> This works fine for me. 
>
>
> Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > Since I had a Linux host available, I went around that issue with 
> Windows 
> > and shell expansion. 
> > 
> > pandoc -f html -t epub3 -o output.epub input.html 
> > 
> > 
> > pandoc ran successfully (no error message), but the EPUB can't be opened 
> in 
> > a Windows GUI application that supports EPUB files ("Error loading 
> > file.epub"). Likewise, I can't open the file after changing its 
> extension 
> > from EPUB to ZIP. 
> > 
> > Here's the input files (HTML + PNGs): 
> > 
> > https://we.tl/t-5EeGXML1rb 
> > 
> > Do I need extra options in the command line? 
> > 
> > Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a écrit : 
> >> 
> >> Thanks everyone for the infos! 
> >> 
> >> Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a écrit : 
> >>> 
> >>> A side note, since your goal is to convert from PDF to ePub, you 
> probably 
> >>> will have better results using other tools. Eg I know it can be 
> converted 
> >>> to docx, and then from docx to ePub. There may he tool that can help 
> you 
> >>> convert that directly too. Essentially for the tools you choose, you’d 
> want 
> >>> to choose one preserving most information. And since pandoc focuses 
> many on 
> >>> the structure of the document, much other information would be lost. 
> The 
> >>> choice of tool also depends on which ones you’re comfortable with, Eg 
> the 
> >>> PDF to docx I mentioned probably can be done by Adobe Acrobat and MS 
> Word. 
> >>> But they are proprietary and difficult to run from the command line. 
> >>> 
> >>> In your case, since you have a tool preconverted them to html already, 
> >>> html to ePub can be done better by some other engines (since the 2 are 
> >>> closely related.) may be you can try Calibre which also have a cli. 
> >> 
> >> 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8b09d6%40googlegroups.com. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/026f695e-0849-4c01-969b-0c2ccbeb31b9%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5216 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]                     ` <026f695e-0849-4c01-969b-0c2ccbeb31b9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-22 22:17                       ` Kolen Cheung
       [not found]                         ` <60dc6b96-7284-47e3-bbb2-938857c61dd5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Kolen Cheung @ 2020-04-22 22:17 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3992 bytes --]

Version too old. Try to reproduce it using the latest 
version: https://github.com/jgm/pandoc/releases/latest There's various way 
to install it, e.g. you can just unzip pandoc-2.9.2.1-linux-amd64.tar.gz 
and put pandoc and pandoc-citeproc to somewhere in your path, such as 
~/.local/bin

(To take one more step you can go to the GitHub Action to download the 
latest nightly build to make sure the problem has not been solved yet.)

In general you'd want to ensure the problem has not been solved yet, and to 
do that you want the latest version, which unfortunately in distros with 
package manager can be a big problem because people often just use the one 
from there, which is too old especially from Ubuntu.

On Wednesday, April 22, 2020 at 2:59:38 PM UTC-7, Heck Lennon wrote:
>
> pandoc 2.5.2 on Ubuntu 19.10.
>
> Turns out I had to use "-t epub" instead of "-t epub3" :
>
> pandoc -f html -t epub -o output.epub input.html
>
> Thank you.
>
> Le mercredi 22 avril 2020 17:58:39 UTC+2, John MacFarlane a écrit :
>>
>>
>> What pandoc version are you running on the linux box? 
>> This works fine for me. 
>>
>>
>> Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: 
>>
>> > Since I had a Linux host available, I went around that issue with 
>> Windows 
>> > and shell expansion. 
>> > 
>> > pandoc -f html -t epub3 -o output.epub input.html 
>> > 
>> > 
>> > pandoc ran successfully (no error message), but the EPUB can't be 
>> opened in 
>> > a Windows GUI application that supports EPUB files ("Error loading 
>> > file.epub"). Likewise, I can't open the file after changing its 
>> extension 
>> > from EPUB to ZIP. 
>> > 
>> > Here's the input files (HTML + PNGs): 
>> > 
>> > https://we.tl/t-5EeGXML1rb 
>> > 
>> > Do I need extra options in the command line? 
>> > 
>> > Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a écrit : 
>> >> 
>> >> Thanks everyone for the infos! 
>> >> 
>> >> Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a écrit : 
>> >>> 
>> >>> A side note, since your goal is to convert from PDF to ePub, you 
>> probably 
>> >>> will have better results using other tools. Eg I know it can be 
>> converted 
>> >>> to docx, and then from docx to ePub. There may he tool that can help 
>> you 
>> >>> convert that directly too. Essentially for the tools you choose, 
>> you’d want 
>> >>> to choose one preserving most information. And since pandoc focuses 
>> many on 
>> >>> the structure of the document, much other information would be lost. 
>> The 
>> >>> choice of tool also depends on which ones you’re comfortable with, Eg 
>> the 
>> >>> PDF to docx I mentioned probably can be done by Adobe Acrobat and MS 
>> Word. 
>> >>> But they are proprietary and difficult to run from the command line. 
>> >>> 
>> >>> In your case, since you have a tool preconverted them to html 
>> already, 
>> >>> html to ePub can be done better by some other engines (since the 2 
>> are 
>> >>> closely related.) may be you can try Calibre which also have a cli. 
>> >> 
>> >> 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org 
>> > To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8b09d6%40googlegroups.com. 
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/60dc6b96-7284-47e3-bbb2-938857c61dd5%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5865 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]                         ` <60dc6b96-7284-47e3-bbb2-938857c61dd5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-23 14:53                           ` Heck Lennon
       [not found]                             ` <774af370-df13-43ec-97bc-68af09d2c2f4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Heck Lennon @ 2020-04-23 14:53 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4644 bytes --]

Thanks for the tip.

1. Removed the Ubuntu 2.5-2 package through apt-get remove
2. Downloaded and installed pandoc-2.9.2.1-1-amd64.deb
3. Ran : pandoc -f html -t epub3 -o output.epub3.epub input.html AND pandoc 
-f html -t epub -o output.epub2.epub input.html
4. Opened each file in Windows with SumatraPDF (which suppports epub): Both 
opened OK.

The issue remains on how to better convert PDF to HTML, but this has 
nothing to do with pandoc.

Thank you all !

Le jeudi 23 avril 2020 00:17:54 UTC+2, Kolen Cheung a écrit :
>
> Version too old. Try to reproduce it using the latest version: 
> https://github.com/jgm/pandoc/releases/latest There's various way to 
> install it, e.g. you can just unzip pandoc-2.9.2.1-linux-amd64.tar.gz and 
> put pandoc and pandoc-citeproc to somewhere in your path, such as 
> ~/.local/bin
>
> (To take one more step you can go to the GitHub Action to download the 
> latest nightly build to make sure the problem has not been solved yet.)
>
> In general you'd want to ensure the problem has not been solved yet, and 
> to do that you want the latest version, which unfortunately in distros with 
> package manager can be a big problem because people often just use the one 
> from there, which is too old especially from Ubuntu.
>
> On Wednesday, April 22, 2020 at 2:59:38 PM UTC-7, Heck Lennon wrote:
>>
>> pandoc 2.5.2 on Ubuntu 19.10.
>>
>> Turns out I had to use "-t epub" instead of "-t epub3" :
>>
>> pandoc -f html -t epub -o output.epub input.html
>>
>> Thank you.
>>
>> Le mercredi 22 avril 2020 17:58:39 UTC+2, John MacFarlane a écrit :
>>>
>>>
>>> What pandoc version are you running on the linux box? 
>>> This works fine for me. 
>>>
>>>
>>> Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: 
>>>
>>> > Since I had a Linux host available, I went around that issue with 
>>> Windows 
>>> > and shell expansion. 
>>> > 
>>> > pandoc -f html -t epub3 -o output.epub input.html 
>>> > 
>>> > 
>>> > pandoc ran successfully (no error message), but the EPUB can't be 
>>> opened in 
>>> > a Windows GUI application that supports EPUB files ("Error loading 
>>> > file.epub"). Likewise, I can't open the file after changing its 
>>> extension 
>>> > from EPUB to ZIP. 
>>> > 
>>> > Here's the input files (HTML + PNGs): 
>>> > 
>>> > https://we.tl/t-5EeGXML1rb 
>>> > 
>>> > Do I need extra options in the command line? 
>>> > 
>>> > Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a écrit : 
>>> >> 
>>> >> Thanks everyone for the infos! 
>>> >> 
>>> >> Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a écrit : 
>>> >>> 
>>> >>> A side note, since your goal is to convert from PDF to ePub, you 
>>> probably 
>>> >>> will have better results using other tools. Eg I know it can be 
>>> converted 
>>> >>> to docx, and then from docx to ePub. There may he tool that can help 
>>> you 
>>> >>> convert that directly too. Essentially for the tools you choose, 
>>> you’d want 
>>> >>> to choose one preserving most information. And since pandoc focuses 
>>> many on 
>>> >>> the structure of the document, much other information would be lost. 
>>> The 
>>> >>> choice of tool also depends on which ones you’re comfortable with, 
>>> Eg the 
>>> >>> PDF to docx I mentioned probably can be done by Adobe Acrobat and MS 
>>> Word. 
>>> >>> But they are proprietary and difficult to run from the command line. 
>>> >>> 
>>> >>> In your case, since you have a tool preconverted them to html 
>>> already, 
>>> >>> html to ePub can be done better by some other engines (since the 2 
>>> are 
>>> >>> closely related.) may be you can try Calibre which also have a cli. 
>>> >> 
>>> >> 
>>> > 
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> Groups "pandoc-discuss" group. 
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org 
>>> > To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8b09d6%40googlegroups.com. 
>>>
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/774af370-df13-43ec-97bc-68af09d2c2f4%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 7181 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)"
       [not found]                             ` <774af370-df13-43ec-97bc-68af09d2c2f4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-29  0:44                               ` Kolen Cheung
  0 siblings, 0 replies; 14+ messages in thread
From: Kolen Cheung @ 2020-04-29  0:44 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5597 bytes --]

I think you could still ask people here. Pandoc Discuss is more casual and 
as long as you make it clear I think people are ok with it. There's some 
experts here about format conversions (for obvious reasons.)

Or if you still want to use pandoc in part of the "from PDF" workflow then 
it is very relevant here. Essentially pandoc can take many formats as input 
and many other softwares can read from PDF and write to a certain formats 
pandoc understands. The question then which intermediate format and 
software are best to use.

e.g. I have tried Acrobat/Word to docx and then pass it to pandoc.

For PDF images I converted them to svg and inline as an image by pdf2svg 
(inline PDF as image is also ok for some output formats.)

On Thursday, April 23, 2020 at 7:53:02 AM UTC-7, Heck Lennon wrote:
>
> Thanks for the tip.
>
> 1. Removed the Ubuntu 2.5-2 package through apt-get remove
> 2. Downloaded and installed pandoc-2.9.2.1-1-amd64.deb
> 3. Ran : pandoc -f html -t epub3 -o output.epub3.epub input.html AND 
> pandoc -f html -t epub -o output.epub2.epub input.html
> 4. Opened each file in Windows with SumatraPDF (which suppports epub): 
> Both opened OK.
>
> The issue remains on how to better convert PDF to HTML, but this has 
> nothing to do with pandoc.
>
> Thank you all !
>
> Le jeudi 23 avril 2020 00:17:54 UTC+2, Kolen Cheung a écrit :
>>
>> Version too old. Try to reproduce it using the latest version: 
>> https://github.com/jgm/pandoc/releases/latest There's various way to 
>> install it, e.g. you can just unzip pandoc-2.9.2.1-linux-amd64.tar.gz and 
>> put pandoc and pandoc-citeproc to somewhere in your path, such as 
>> ~/.local/bin
>>
>> (To take one more step you can go to the GitHub Action to download the 
>> latest nightly build to make sure the problem has not been solved yet.)
>>
>> In general you'd want to ensure the problem has not been solved yet, and 
>> to do that you want the latest version, which unfortunately in distros with 
>> package manager can be a big problem because people often just use the one 
>> from there, which is too old especially from Ubuntu.
>>
>> On Wednesday, April 22, 2020 at 2:59:38 PM UTC-7, Heck Lennon wrote:
>>>
>>> pandoc 2.5.2 on Ubuntu 19.10.
>>>
>>> Turns out I had to use "-t epub" instead of "-t epub3" :
>>>
>>> pandoc -f html -t epub -o output.epub input.html
>>>
>>> Thank you.
>>>
>>> Le mercredi 22 avril 2020 17:58:39 UTC+2, John MacFarlane a écrit :
>>>>
>>>>
>>>> What pandoc version are you running on the linux box? 
>>>> This works fine for me. 
>>>>
>>>>
>>>> Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: 
>>>>
>>>> > Since I had a Linux host available, I went around that issue with 
>>>> Windows 
>>>> > and shell expansion. 
>>>> > 
>>>> > pandoc -f html -t epub3 -o output.epub input.html 
>>>> > 
>>>> > 
>>>> > pandoc ran successfully (no error message), but the EPUB can't be 
>>>> opened in 
>>>> > a Windows GUI application that supports EPUB files ("Error loading 
>>>> > file.epub"). Likewise, I can't open the file after changing its 
>>>> extension 
>>>> > from EPUB to ZIP. 
>>>> > 
>>>> > Here's the input files (HTML + PNGs): 
>>>> > 
>>>> > https://we.tl/t-5EeGXML1rb 
>>>> > 
>>>> > Do I need extra options in the command line? 
>>>> > 
>>>> > Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a écrit : 
>>>> >> 
>>>> >> Thanks everyone for the infos! 
>>>> >> 
>>>> >> Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a écrit : 
>>>> >>> 
>>>> >>> A side note, since your goal is to convert from PDF to ePub, you 
>>>> probably 
>>>> >>> will have better results using other tools. Eg I know it can be 
>>>> converted 
>>>> >>> to docx, and then from docx to ePub. There may he tool that can 
>>>> help you 
>>>> >>> convert that directly too. Essentially for the tools you choose, 
>>>> you’d want 
>>>> >>> to choose one preserving most information. And since pandoc focuses 
>>>> many on 
>>>> >>> the structure of the document, much other information would be 
>>>> lost. The 
>>>> >>> choice of tool also depends on which ones you’re comfortable with, 
>>>> Eg the 
>>>> >>> PDF to docx I mentioned probably can be done by Adobe Acrobat and 
>>>> MS Word. 
>>>> >>> But they are proprietary and difficult to run from the command 
>>>> line. 
>>>> >>> 
>>>> >>> In your case, since you have a tool preconverted them to html 
>>>> already, 
>>>> >>> html to ePub can be done better by some other engines (since the 2 
>>>> are 
>>>> >>> closely related.) may be you can try Calibre which also have a cli. 
>>>> >> 
>>>> >> 
>>>> > 
>>>> > -- 
>>>> > You received this message because you are subscribed to the Google 
>>>> Groups "pandoc-discuss" group. 
>>>> > To unsubscribe from this group and stop receiving emails from it, 
>>>> send an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org 
>>>> > To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8b09d6%40googlegroups.com. 
>>>>
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e6b7a39d-47da-482e-ac03-13e593f3c630%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 8248 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-29  0:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21  0:29 HTML → EPUB: Either "Out of memory" or "openBinaryFile: invalid argument (Invalid argument)" Heck Lennon
     [not found] ` <cfd086c1-9fe5-41bd-b735-3cd8db7579d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-21  5:40   ` John MacFarlane
     [not found]     ` <m2d081o0qc.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-04-21 10:10       ` Heck Lennon
     [not found]         ` <65ccb50b-6595-450d-86ca-c8103867e3bf-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-21 10:52           ` Heck Lennon
     [not found]             ` <f11a136c-0f32-4a59-b7cf-4aab865e1d68-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-21 18:21               ` John MacFarlane
     [not found]                 ` <m2368wog2l.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-04-21 19:40                   ` Anders Eriksson DC
2020-04-21 23:25   ` Kolen Cheung
     [not found]     ` <879425ff-d491-4d0b-8ffe-db24ad9cce23-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-22  9:55       ` Heck Lennon
     [not found]         ` <14c0eaf0-b920-477c-a735-dded7f1df0c5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-22 12:30           ` Heck Lennon
     [not found]             ` <b3218bbb-9846-4e52-b201-7e4a1b8b09d6-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-22 15:58               ` John MacFarlane
     [not found]                 ` <m2tv1bfr6q.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-04-22 21:59                   ` Heck Lennon
     [not found]                     ` <026f695e-0849-4c01-969b-0c2ccbeb31b9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-22 22:17                       ` Kolen Cheung
     [not found]                         ` <60dc6b96-7284-47e3-bbb2-938857c61dd5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-23 14:53                           ` Heck Lennon
     [not found]                             ` <774af370-df13-43ec-97bc-68af09d2c2f4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-29  0:44                               ` Kolen Cheung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).