From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/24947 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Heck Lennon Newsgroups: gmane.text.pandoc Subject: =?UTF-8?Q?Re:_HTML_=E2=86=92_EPUB:_Either_"Out_of_memory"_or_"open?= =?UTF-8?Q?BinaryFile:_invalid_argument_(Invalid_argument)"?= Date: Wed, 22 Apr 2020 05:30:46 -0700 (PDT) Message-ID: References: <879425ff-d491-4d0b-8ffe-db24ad9cce23@googlegroups.com> <14c0eaf0-b920-477c-a735-dded7f1df0c5@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_224_345019000.1587558646964" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="93378"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDPJHXO6WIOBB57RQD2QKGQEQIGT4NI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Apr 22 14:30:51 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oo1-f60.google.com ([209.85.161.60]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1jREWh-000ODE-49 for gtp-pandoc-discuss@m.gmane-mx.org; Wed, 22 Apr 2020 14:30:51 +0200 Original-Received: by mail-oo1-f60.google.com with SMTP id i6sf1028020oof.21 for ; Wed, 22 Apr 2020 05:30:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=kQRiWZ34T2XnEXN3ZbD5TaLnFL/Njn2TsuDhElkH4Ac=; b=N8LdusArTyoQapYsHCTKBa9DRICV6Tg+TKlzDEGcH+8kAXFX0AvuVVuK/DwOqYxdyc bmgAjWbSs4UU6g1Y9795MsFST3r1FKIgIx2bMwipCAWK7R7aTt5Iin13lr1T3zbbPyLz dbUgYjtcHiE5bF4EoUJRpXDntL7lJGzO9H0oxM5CwukFc0ZGnDPqKy6Hkbj83rAdzhQf f2svnYeYIRwDnYItAOQizcEjT25ZngvS7ihkvtMEpyGjSClHEjVzJKHbho0V2LCZmA3K 6T6KiaV9rLxk9G7A2mDxqifAzh1Hoj1DQ+WtVzdBZ4xvNXNIcHNMjIF4h3dD7wdCaJr0 IhXg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=kQRiWZ34T2XnEXN3ZbD5TaLnFL/Njn2TsuDhElkH4Ac=; b=KoDXIk/jKfVbftyz0fCoKsquOnx3SG9Tn4M5/I57C3IQ3IUXq/iG9rv/dzsAf0WlRG 9vpkb3FMVMSaolIvCJ9wLJB9cvbQCAfZFFwOyESW2UMtFdMLWK0eSzlp6WK2mbMNab4A IagZbwN4sJC+ozzf+QvmMMMnmWYJme13YpANomWC9Rn8NEuWVRTBo8XDqcA6tC1DZxyD bOGmwOX7Rndb8N9pCkJaYVHxqQzaLJF6D1yma8R9aHfsVreas9KFQWsY6rmsMs1W/ooH gaXLbGtkzBfdgZU6GyvXh/IgxFUD7MrCN0/C8IMsuuLLSqT5rtTpgOkraTt9stC/Krv1 mw4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=kQRiWZ34T2XnEXN3ZbD5TaLnFL/Njn2TsuDhElkH4Ac=; b=E6WwbPABV4aS32poD7TuR+eA8fhXcI7aViECIcQbAbOJjSWmy0U/MfkHZGb0VRsaZi Q6+Znlee2rAzy5+kRaQu//kMVm5fxfe4+OzlU1RtLn1QkwG1CzUbyPWz7FfzXNRePvjb ZhPf20gFI+3vCScQVB2LiiMi8XefElHnwRaDsnC8UcqEHRYh/TOc7LUvo5JyrGq+qqE/ jLeAB8dF3XBdsuUcsjYSe0iVG5C5CKJ2nhnPraK4kaI8Yic/0qMVYVi3PSPh8i20D13v BO7oWDbOQ3LY5SFvA3dcbYCihmXLJ2jWeM0HjHnob+be58bxvHMYOzmDfQknKYIl/6In hifA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AGi0PuamS5lEAyGWWokiZpoOL7xOCXTGNLCI9bVJOcmzUHRKw4oENCkf 6qB2WuG2yGs3KA7vogFoG0U= X-Google-Smtp-Source: APiQypIR+Zn/rH2o6wC1KntVaoOcoIuQ735rrf0S96leBmUvmm2RQi36NBBJ1r+OrUABFcx+/kpkew== X-Received: by 2002:a9d:6644:: with SMTP id q4mr18125800otm.229.1587558650050; Wed, 22 Apr 2020 05:30:50 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aca:c748:: with SMTP id x69ls463017oif.5.gmail; Wed, 22 Apr 2020 05:30:47 -0700 (PDT) X-Received: by 2002:aca:5b04:: with SMTP id p4mr6988078oib.105.1587558647562; Wed, 22 Apr 2020 05:30:47 -0700 (PDT) In-Reply-To: <14c0eaf0-b920-477c-a735-dded7f1df0c5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: frdtheman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:24947 Archived-At: ------=_Part_224_345019000.1587558646964 Content-Type: multipart/alternative; boundary="----=_Part_225_654756390.1587558646964" ------=_Part_225_654756390.1587558646964 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Since I had a Linux host available, I went around that issue with Windows= =20 and shell expansion. pandoc -f html -t epub3 -o output.epub input.html pandoc ran successfully (no error message), but the EPUB can't be opened in= =20 a Windows GUI application that supports EPUB files ("Error loading=20 file.epub"). Likewise, I can't open the file after changing its extension= =20 from EPUB to ZIP. Here's the input files (HTML + PNGs): https://we.tl/t-5EeGXML1rb Do I need extra options in the command line? Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a =C3=A9crit : > > Thanks everyone for the infos! > > Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a =C3=A9crit : >> >> A side note, since your goal is to convert from PDF to ePub, you probabl= y=20 >> will have better results using other tools. Eg I know it can be converte= d=20 >> to docx, and then from docx to ePub. There may he tool that can help you= =20 >> convert that directly too. Essentially for the tools you choose, you=E2= =80=99d want=20 >> to choose one preserving most information. And since pandoc focuses many= on=20 >> the structure of the document, much other information would be lost. The= =20 >> choice of tool also depends on which ones you=E2=80=99re comfortable wit= h, Eg the=20 >> PDF to docx I mentioned probably can be done by Adobe Acrobat and MS Wor= d.=20 >> But they are proprietary and difficult to run from the command line.=20 >> >> In your case, since you have a tool preconverted them to html already,= =20 >> html to ePub can be done better by some other engines (since the 2 are= =20 >> closely related.) may be you can try Calibre which also have a cli. > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8b09d6%40googlegroups.com. ------=_Part_225_654756390.1587558646964 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Since I had a Linux host available, I went around that iss= ue with Windows and shell expansion.

pandoc -f html= -t epub3 -o output.epub input.html


p= andoc ran successfully (no error message), but the EPUB can't be opened= in a Windows GUI application that supports EPUB files ("Error loading= file.epub"). Likewise, I can't open the file after changing its e= xtension from EPUB to ZIP.

Here's the input fi= les (HTML + PNGs):

https://we.tl/t-5EeGXML1rb<= br>

Do I need extra options in the command line?
Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a =C3=A9crit=C2= =A0:
Thanks ev= eryone for the infos!

Le mercredi 22 avril 2020 01:25:21 UTC+2, Kole= n Cheung a =C3=A9crit=C2=A0:
A side = note, since your goal is to convert from PDF to ePub, you probably will hav= e better results using other tools. Eg I know it can be converted to docx, = and then from docx to ePub. There may he tool that can help you convert tha= t directly too. Essentially for the tools you choose, you=E2=80=99d want to= choose one preserving most information. And since pandoc focuses many on t= he structure of the document, much other information would be lost. The cho= ice of tool also depends on which ones you=E2=80=99re comfortable with, Eg = the PDF to docx I mentioned probably can be done by Adobe Acrobat and MS Wo= rd. But they are proprietary and difficult to run from the command line.

In your case, since you have a tool preconverted them to html already, = html to ePub can be done better by some other engines (since the 2 are clos= ely related.) may be you can try Calibre which also have a cli.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8b09d6%40googlegroups.co= m.
------=_Part_225_654756390.1587558646964-- ------=_Part_224_345019000.1587558646964--