I think you could still ask people here. Pandoc Discuss is more casual and as long as you make it clear I think people are ok with it. There's some experts here about format conversions (for obvious reasons.)

Or if you still want to use pandoc in part of the "from PDF" workflow then it is very relevant here. Essentially pandoc can take many formats as input and many other softwares can read from PDF and write to a certain formats pandoc understands. The question then which intermediate format and software are best to use.

e.g. I have tried Acrobat/Word to docx and then pass it to pandoc.

For PDF images I converted them to svg and inline as an image by pdf2svg (inline PDF as image is also ok for some output formats.)

On Thursday, April 23, 2020 at 7:53:02 AM UTC-7, Heck Lennon wrote:
Thanks for the tip.

1. Removed the Ubuntu 2.5-2 package through apt-get remove
2. Downloaded and installed pandoc-2.9.2.1-1-amd64.deb
3. Ran : pandoc -f html -t epub3 -o output.epub3.epub input.html AND pandoc -f html -t epub -o output.epub2.epub input.html
4. Opened each file in Windows with SumatraPDF (which suppports epub): Both opened OK.

The issue remains on how to better convert PDF to HTML, but this has nothing to do with pandoc.

Thank you all !

Le jeudi 23 avril 2020 00:17:54 UTC+2, Kolen Cheung a écrit :
Version too old. Try to reproduce it using the latest version: https://github.com/jgm/pandoc/releases/latest There's various way to install it, e.g. you can just unzip pandoc-2.9.2.1-linux-amd64.tar.gz and put pandoc and pandoc-citeproc to somewhere in your path, such as ~/.local/bin

(To take one more step you can go to the GitHub Action to download the latest nightly build to make sure the problem has not been solved yet.)

In general you'd want to ensure the problem has not been solved yet, and to do that you want the latest version, which unfortunately in distros with package manager can be a big problem because people often just use the one from there, which is too old especially from Ubuntu.

On Wednesday, April 22, 2020 at 2:59:38 PM UTC-7, Heck Lennon wrote:
pandoc 2.5.2 on Ubuntu 19.10.

Turns out I had to use "-t epub" instead of "-t epub3" :

pandoc -f html -t epub -o output.epub input.html

Thank you.

Le mercredi 22 avril 2020 17:58:39 UTC+2, John MacFarlane a écrit :

What pandoc version are you running on the linux box?
This works fine for me.


Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Since I had a Linux host available, I went around that issue with Windows
> and shell expansion.
>
> pandoc -f html -t epub3 -o output.epub input.html
>
>
> pandoc ran successfully (no error message), but the EPUB can't be opened in
> a Windows GUI application that supports EPUB files ("Error loading
> file.epub"). Likewise, I can't open the file after changing its extension
> from EPUB to ZIP.
>
> Here's the input files (HTML + PNGs):
>
> https://we.tl/t-5EeGXML1rb
>
> Do I need extra options in the command line?
>
> Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a écrit :
>>
>> Thanks everyone for the infos!
>>
>> Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a écrit :
>>>
>>> A side note, since your goal is to convert from PDF to ePub, you probably
>>> will have better results using other tools. Eg I know it can be converted
>>> to docx, and then from docx to ePub. There may he tool that can help you
>>> convert that directly too. Essentially for the tools you choose, you’d want
>>> to choose one preserving most information. And since pandoc focuses many on
>>> the structure of the document, much other information would be lost. The
>>> choice of tool also depends on which ones you’re comfortable with, Eg the
>>> PDF to docx I mentioned probably can be done by Adobe Acrobat and MS Word.
>>> But they are proprietary and difficult to run from the command line.
>>>
>>> In your case, since you have a tool preconverted them to html already,
>>> html to ePub can be done better by some other engines (since the 2 are
>>> closely related.) may be you can try Calibre which also have a cli.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8b09d6%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e6b7a39d-47da-482e-ac03-13e593f3c630%40googlegroups.com.