From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/24957 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Kolen Cheung Newsgroups: gmane.text.pandoc Subject: =?UTF-8?Q?Re:_HTML_=E2=86=92_EPUB:_Either_"Out_of_memory"_or_"open?= =?UTF-8?Q?BinaryFile:_invalid_argument_(Invalid_argument)"?= Date: Wed, 22 Apr 2020 15:17:54 -0700 (PDT) Message-ID: <60dc6b96-7284-47e3-bbb2-938857c61dd5@googlegroups.com> References: <879425ff-d491-4d0b-8ffe-db24ad9cce23@googlegroups.com> <14c0eaf0-b920-477c-a735-dded7f1df0c5@googlegroups.com> <026f695e-0849-4c01-969b-0c2ccbeb31b9@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_3136_695118916.1587593874384" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="89331"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCS252WXTEIBBE4FQP2QKGQE6CG5VPI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Apr 23 00:17:59 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ot1-f61.google.com ([209.85.210.61]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1jRNgs-000N6y-Q4 for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 23 Apr 2020 00:17:58 +0200 Original-Received: by mail-ot1-f61.google.com with SMTP id 22sf2640150otg.7 for ; Wed, 22 Apr 2020 15:17:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=BRP57eTNHYjLCxr027I0wsBo0062FYzAJmrWF3jZa0A=; b=Z/xv7igb9kzoxKu0BT/G1y/oYrhtJBg6ItyyTKbytFy1tQnwrqyaqeNg4efdtN7nzN Ind/P6PZUmKZUzwul/BKCuW0/7zULYQXIBxQENXsFWWAkd47aQPpaoyIJGCkzeu/ZFHz aePxJZfNuPhW3lucxg5kGmynGXnlRW0sNPVML03r5VMs324eSUuf44+gSWM9PnIDnYEj 7bK96XB3ZFzrF34LO61+zpuNYp1pCdbnMShMXpN2CHmeWBP8RbA+6VSUk/ZsMgKOxzva 9mInpaxc1NZdcJuMsQTpK7S+iPqYCIYiLr0dEJ8fyCK/GBM/W6WTC0uA6/2PPFKkMtzV DPRw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=BRP57eTNHYjLCxr027I0wsBo0062FYzAJmrWF3jZa0A=; b=uhQI2dt6LfCk6niXHMVG99gNFGOouPKpLYR+o1qJW6Yu9e4gDet/cPrHSGST4fOYeS cblzTjYGlUB8QNJTLCNlB4/b/ro335J57U7bdJ5kBG4kTIkHmMovYo9lBiTMf1PrQdoW qH0Lzpa1PGLWdGNPSBpm1qKDq4atoE+eMmlvE6aeaIsaDQcLJls/vUDCCivqpteTdNcI Y1DHIwCZahcvVbWHJK6KNBWcloJ+H3N40oupz1f1hTczKfKOWXbyskQffhbx86+iXQ77 SdqK3OPMCs/hccuggaKcmDZoZjDBfrV+LPIpTKMAYN7RG2lHgkaVIFW/PMK9d9O96jK0 Oj8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=BRP57eTNHYjLCxr027I0wsBo0062FYzAJmrWF3jZa0A=; b=qIodMwEKjjObIPpGTyXQT8mZYBwYO3NlEYdL4iKVbkIESEquuxwjWPhR2y6uOogDmv TeX6n4l+PqtUYLLw1OtGJYYj8QvMpid/x+/5xVnuZgXo1sBVJFpNdkVCUa2/VxAoubhu jS3IwdGzPbKMlYYp4vesB91NthhMHX44r2DmJScuRgPywPJv+QfMGbElmnvY5Kfbn5n4 c7DgeGV9xbUpdMbtjJEdRzs309GFblHaV2iJEs8eLndIWdUOmoTR+CIoGybpP3k1zyUC +EGvGRgHYzpFXYw143tf3z+13CKLeeL0cPbQwtY5oqfb99y/VdpzmsMxBTIC23sPI98X +C7g== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AGi0PubagcZyoXdgcmU07JI4yFQ3NvTaAnFIEOre18+JYftdg+RIlDf9 ECLOQsLzdinwvJVARBXkMCg= X-Google-Smtp-Source: APiQypLpJVX8lKvNugQOlBhCZkl5bqM8ewTHaNzCdbsCaJPgLcYWArLXL9IxyqyhXdnm08T7K3kIXg== X-Received: by 2002:a05:6830:1656:: with SMTP id h22mr1118675otr.290.1587593877682; Wed, 22 Apr 2020 15:17:57 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a4a:580f:: with SMTP id f15ls286668oob.5.gmail; Wed, 22 Apr 2020 15:17:55 -0700 (PDT) X-Received: by 2002:a4a:940e:: with SMTP id h14mr866968ooi.26.1587593874950; Wed, 22 Apr 2020 15:17:54 -0700 (PDT) In-Reply-To: <026f695e-0849-4c01-969b-0c2ccbeb31b9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:24957 Archived-At: ------=_Part_3136_695118916.1587593874384 Content-Type: multipart/alternative; boundary="----=_Part_3137_1329524625.1587593874384" ------=_Part_3137_1329524625.1587593874384 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Version too old. Try to reproduce it using the latest=20 version: https://github.com/jgm/pandoc/releases/latest There's various way= =20 to install it, e.g. you can just unzip pandoc-2.9.2.1-linux-amd64.tar.gz=20 and put pandoc and pandoc-citeproc to somewhere in your path, such as=20 ~/.local/bin (To take one more step you can go to the GitHub Action to download the=20 latest nightly build to make sure the problem has not been solved yet.) In general you'd want to ensure the problem has not been solved yet, and to= =20 do that you want the latest version, which unfortunately in distros with=20 package manager can be a big problem because people often just use the one= =20 from there, which is too old especially from Ubuntu. On Wednesday, April 22, 2020 at 2:59:38 PM UTC-7, Heck Lennon wrote: > > pandoc 2.5.2 on Ubuntu 19.10. > > Turns out I had to use "-t epub" instead of "-t epub3" : > > pandoc -f html -t epub -o output.epub input.html > > Thank you. > > Le mercredi 22 avril 2020 17:58:39 UTC+2, John MacFarlane a =C3=A9crit : >> >> >> What pandoc version are you running on the linux box?=20 >> This works fine for me.=20 >> >> >> Heck Lennon writes:=20 >> >> > Since I had a Linux host available, I went around that issue with=20 >> Windows=20 >> > and shell expansion.=20 >> >=20 >> > pandoc -f html -t epub3 -o output.epub input.html=20 >> >=20 >> >=20 >> > pandoc ran successfully (no error message), but the EPUB can't be=20 >> opened in=20 >> > a Windows GUI application that supports EPUB files ("Error loading=20 >> > file.epub"). Likewise, I can't open the file after changing its=20 >> extension=20 >> > from EPUB to ZIP.=20 >> >=20 >> > Here's the input files (HTML + PNGs):=20 >> >=20 >> > https://we.tl/t-5EeGXML1rb=20 >> >=20 >> > Do I need extra options in the command line?=20 >> >=20 >> > Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a =C3=A9crit := =20 >> >>=20 >> >> Thanks everyone for the infos!=20 >> >>=20 >> >> Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a =C3=A9crit := =20 >> >>>=20 >> >>> A side note, since your goal is to convert from PDF to ePub, you=20 >> probably=20 >> >>> will have better results using other tools. Eg I know it can be=20 >> converted=20 >> >>> to docx, and then from docx to ePub. There may he tool that can help= =20 >> you=20 >> >>> convert that directly too. Essentially for the tools you choose,=20 >> you=E2=80=99d want=20 >> >>> to choose one preserving most information. And since pandoc focuses= =20 >> many on=20 >> >>> the structure of the document, much other information would be lost.= =20 >> The=20 >> >>> choice of tool also depends on which ones you=E2=80=99re comfortable= with, Eg=20 >> the=20 >> >>> PDF to docx I mentioned probably can be done by Adobe Acrobat and MS= =20 >> Word.=20 >> >>> But they are proprietary and difficult to run from the command line.= =20 >> >>>=20 >> >>> In your case, since you have a tool preconverted them to html=20 >> already,=20 >> >>> html to ePub can be done better by some other engines (since the 2= =20 >> are=20 >> >>> closely related.) may be you can try Calibre which also have a cli.= =20 >> >>=20 >> >>=20 >> >=20 >> > --=20 >> > You received this message because you are subscribed to the Google=20 >> Groups "pandoc-discuss" group.=20 >> > To unsubscribe from this group and stop receiving emails from it, send= =20 >> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org=20 >> > To view this discussion on the web visit=20 >> https://groups.google.com/d/msgid/pandoc-discuss/b3218bbb-9846-4e52-b201= -7e4a1b8b09d6%40googlegroups.com.=20 >> >> > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/60dc6b96-7284-47e3-bbb2-938857c61dd5%40googlegroups.com. ------=_Part_3137_1329524625.1587593874384 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Version too old. Try to reproduce it using the latest vers= ion:=C2=A0https://github.com/jgm/pandoc/releases/latest There's various= way to install it, e.g. you can just unzip=C2=A0pandoc-2.9.2.1-linux-amd64= .tar.gz and put pandoc and pandoc-citeproc to somewhere in your path, such = as ~/.local/bin

(To take one more step you can go to the= GitHub Action to download the latest nightly build to make sure the proble= m has not been solved yet.)

In general you'd w= ant to ensure the problem has not been solved yet, and to do that you want = the latest version, which unfortunately in distros with package manager can= be a big problem because people often just use the one from there, which i= s too old especially from Ubuntu.

On Wednesday, April 22= , 2020 at 2:59:38 PM UTC-7, Heck Lennon wrote:
pandoc 2.5.2 on Ubuntu 19.10.

Turns= out I had to use "-t epub" instead of "-t epub3" :
pandoc -f html -t epu= b -o output.epub input.html

Thank you.<= /div>

Le mercredi 22 avril 2020 17:58:39 UTC+2, John MacFarlane a = =C3=A9crit=C2=A0:

What pandoc version are you running on the linux box?
This works fine for me.


Heck Lennon <frdt...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Since I had a Linux host available, I went around that issue with = Windows=20
> and shell expansion.
>
> pandoc -f html -t epub3 -o output.epub input.html
>
>
> pandoc ran successfully (no error message), but the EPUB can't= be opened in=20
> a Windows GUI application that supports EPUB files ("Error lo= ading=20
> file.epub"). Likewise, I can't open the file after changi= ng its extension=20
> from EPUB to ZIP.
>
> Here's the input files (HTML + PNGs):
>
> https://we.tl/t-5EeGXML1rb
>
> Do I need extra options in the command line?
>
> Le mercredi 22 avril 2020 11:55:49 UTC+2, Heck Lennon a =C3=A9crit= :
>>
>> Thanks everyone for the infos!
>>
>> Le mercredi 22 avril 2020 01:25:21 UTC+2, Kolen Cheung a =C3= =A9crit :
>>>
>>> A side note, since your goal is to convert from PDF to ePu= b, you probably=20
>>> will have better results using other tools. Eg I know it c= an be converted=20
>>> to docx, and then from docx to ePub. There may he tool tha= t can help you=20
>>> convert that directly too. Essentially for the tools you c= hoose, you=E2=80=99d want=20
>>> to choose one preserving most information. And since pando= c focuses many on=20
>>> the structure of the document, much other information woul= d be lost. The=20
>>> choice of tool also depends on which ones you=E2=80=99re c= omfortable with, Eg the=20
>>> PDF to docx I mentioned probably can be done by Adobe Acro= bat and MS Word.=20
>>> But they are proprietary and difficult to run from the com= mand line.=20
>>>
>>> In your case, since you have a tool preconverted them to h= tml already,=20
>>> html to ePub can be done better by some other engines (sin= ce the 2 are=20
>>> closely related.) may be you can try Calibre which also ha= ve a cli.
>>
>>
>
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/b3218bbb-9846-4e52-b201-7e4a1b8= b09d6%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/60dc6b96-7284-47e3-bbb2-938857c61dd5%40googlegroups.co= m.
------=_Part_3137_1329524625.1587593874384-- ------=_Part_3136_695118916.1587593874384--