From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/26588 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" Newsgroups: gmane.text.pandoc Subject: Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop Date: Tue, 27 Oct 2020 14:50:42 -0700 (PDT) Message-ID: <22d3d478-357d-464c-b407-aefd2ed81dccn@googlegroups.com> References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_244_1495619843.1603835442834" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21377"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBD4675FCS4BBBM5M4L6AKGQEYV5HBZI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Oct 27 22:50:47 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oo1-f56.google.com ([209.85.161.56]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1kXWrf-0005Rf-1z for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 27 Oct 2020 22:50:47 +0100 Original-Received: by mail-oo1-f56.google.com with SMTP id f12sf1387005oos.23 for ; Tue, 27 Oct 2020 14:50:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=wfMyPsOTf7L1NBTGqiznE+TFICDrAyP4YTzEhJMVPns=; b=s3WXokWdE30n4LgYNwP3D3lO5shiHSbAb1sAmWZSjXgBiRgIJCs4QFNMAQo705gMGL IW1K92Dzpq0YiCy9jC1Y/75f1kIFm8gCcLQGjwgPwoY9F4EzH+33d2ClNHUl/KG3TGqy O/wVAnQbd2U8dbLKldsmrek8cjtAmDyt9mCT4EhXhQpyTKOrk6S39kW9X38dkl1t9SiS kPXQjwaq2XVJU3xGLir5gj6VJLjLsBw4+NWukCryZNYxaH2IH//1l+vgTle0yOKO9yt6 Sspcysy9o1OHh8O+JGzW38L3vBKskS+n5QT+hQh/6pROkk7yh532QgRPyszJzzmZBG2F kluQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=wfMyPsOTf7L1NBTGqiznE+TFICDrAyP4YTzEhJMVPns=; b=ZOjjf2cMX3AEIQYkxUSz46OyrwCebv606AYyQb9Y7YloG9D2C+CFQLRvUkAPRmEHsH ZSD+0rWVcF5/Tt5V6wiEB0btyf/nomEnDij3/1rsud9GOGluHWgZmm92iPzuU3KHX19B 86rrktghKTEAX9eCJ7Kyy1yAsZv1PYxkCiq6NNFbtassR1MDgrVEED0ZyTo99oZEeB+E WFKrhS8ocXevWkiEmhoG9Pim79KHFuqeif5Z0PPfVeH3QS5Q7lshJhpOau/Q+71SD7jI CPPYdgOaj+8WtTjYkCSdccNmBoW2sInnKDHu4mwRV4+qq4OKx8RPSkyv72B+t4uv17+E reCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=wfMyPsOTf7L1NBTGqiznE+TFICDrAyP4YTzEhJMVPns=; b=hbunx2nB8U7u/2rKeDfymt6m4cZdJplrsSMAWfPJJll5J+Awd6xaEN11LepgmYyQOF Fmm6jOSdATCtKk1NUmXK/dfftltcDUB+x7zpR3O4GrkfQ7YWOl777q7CngVH1m5+qliq dYD8HO8D+xeigspocvrAjw5sUSF75ArIqoaCJwFME8JtFOt9i7co9mUGCFjsIPG42Wx1 1kddy8gHXC0WOCpEIQhZUzLQLryOzlQhdkmD0CH9tyLjQz+/XiGjp+Ph3oygOvsCFjtS KDsZRkQ40brnxebyndUf9tcvLbaP4K/qtQSgS0RVTU9Zo3Ov1ARrGv3jsWcj4EDJhp/A r9ZA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM531ygnY7GjD626H/qOP21Ti6tAaZDBL+WWPxvaZnCRLn/fipaV7t UJKus4ALRUi/sizONAqrouE= X-Google-Smtp-Source: ABdhPJxoKiRHlqXPpd3bWGI3xO3//MX5s1VjguZrnI9U8XVDfLiDxeAH/wPhvQENDYIw30yTQEkEYQ== X-Received: by 2002:a9d:75d6:: with SMTP id c22mr2840394otl.213.1603835445991; Tue, 27 Oct 2020 14:50:45 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a9d:6b81:: with SMTP id b1ls735460otq.8.gmail; Tue, 27 Oct 2020 14:50:43 -0700 (PDT) X-Received: by 2002:a9d:58c6:: with SMTP id s6mr3073929oth.67.1603835443533; Tue, 27 Oct 2020 14:50:43 -0700 (PDT) In-Reply-To: X-Original-Sender: cjns1989-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:26588 Archived-At: ------=_Part_244_1495619843.1603835442834 Content-Type: multipart/alternative; boundary="----=_Part_245_769283455.1603835442834" ------=_Part_245_769283455.1603835442834 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable With the nightly version (2.11.0.4)=20 /tmp/pandoc -o epub/test.epub md/title.txt md2/ch*.md=20 --css=3Dcss/stylesheet.css --epub-embed-font=3Dfonts/*=20 --epub-cover-image=3Dimages/cover.png the conversion took seconds. But pandoc complains that, [WARNING] This document format requires a nonempty element. Defaulting to 'title' as the title. To specify a title, use 'title' in metadata or --metadata title=3D"...". And the epubcheck report the following errors probably related to the above= =20 warning: ERROR(RSC-005): epub/test.epub/EPUB/content.opf(9,14): Error while parsing= =20 file: element "metadata" incomplete; missing required element "dc:title" ERROR(RSC-005): epub/test.epub/EPUB/nav.xhtml(11,134): Error while parsing= =20 file: Anchors within nav elements must contain text Check finished with errors Messages: 0 fatal / 2 errors / 0 warnings / 0 info epubcheck completed The title.txt file contains: % URBAIN DUBOIS % La cuisine classique =E2=80=94 Volume II It looks as if pandoc is unable to process the content of the title.txt=20 file. When I take a look at the output everything looks good except that the raw= =20 latex bits are now included verbatim as if they were part of the text/data. On Monday, October 26, 2020 at 5:16:00 PM UTC-4 John MacFarlane wrote: > > There are a few things that can trigger pathological behavior in > the markdown parser. > > One way to find out what is to divide and conquer, converting > shorter and shorter segments of your document to see if you can > find where things get slow. > > Another possibility is to use --trace, which will give you > very verbose output that will allow you to determine where > excessive backtracking is occurring. > > If you don't need all pandoc extensions, and you're using recent > pandoc, you might try `-f commonmark_x`, which uses the > efficient commonmark parser extended with many (but not all) > pandoc extensions. I would expect this to be much faster. > > > > Chris Jones <cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > Six files... ~274,000 words. A pandoc conversion to EPUB last night too= k=20 > > almost 4 hours. Comparable conversions on the same hardware take at mos= t=20 > a=20 > > couple of minutes. > > > > How can I investigate & hopefully optimize?=20 > > > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/af5fe26b-4d84-4dcb-bdcd-= 6382469c476ao%40googlegroups.com > . > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/22d3d478-357d-464c-b407-aefd2ed81dccn%40googlegroups.com. ------=_Part_245_769283455.1603835442834 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable With the nightly version (2.11.0.4) <div><br></div><div>    = /tmp/pandoc -o epub/test.epub md/title.txt md2/ch*.md --css=3Dcss/styleshee= t.css --epub-embed-font=3Dfonts/* --epub-cover-image=3Dimages/cover.png<br>= <div><br></div><div>the conversion took seconds.</div><div><br></div><div>B= ut pandoc complains that,</div><div><br></div><div><div>[WARNING] This docu= ment format requires a nonempty <title> element.</div><div>  Def= aulting to 'title' as the title.</div><div>  To specify a title, use '= title' in metadata or --metadata title=3D"...".</div></div><div><br></div><= div>And the epubcheck report the following errors probably related to the a= bove warning:</div><div><br></div><div><div>ERROR(RSC-005): epub/test.epub/= EPUB/content.opf(9,14): Error while parsing file: element "metadata" incomp= lete; missing required element "dc:title"</div><div>ERROR(RSC-005): epub/te= st.epub/EPUB/nav.xhtml(11,134): Error while parsing file: Anchors within na= v elements must contain text</div><div><br></div><div>Check finished with e= rrors</div><div>Messages: 0 fatal / 2 errors / 0 warnings / 0 info</div><di= v><br></div><div>epubcheck completed</div></div><div><br></div></div><div>T= he title.txt file contains:</div><div><br></div><div><div>% URBAIN DUBOIS</= div><div>% La cuisine classique =E2=80=94 Volume II</div></div><div><br></d= iv><div>It looks as if pandoc is unable to process the content of the title= .txt file.</div><div><br></div><div>When I take a look at the output everyt= hing looks good except that the raw latex bits are now included verbatim as= if they were part of the text/data.</div><div class=3D"gmail_quote"><div d= ir=3D"auto" class=3D"gmail_attr">On Monday, October 26, 2020 at 5:16:00 PM = UTC-4 John MacFarlane wrote:<br/></div><blockquote class=3D"gmail_quote" st= yle=3D"margin: 0 0 0 0.8ex; border-left: 1px solid rgb(204, 204, 204); padd= ing-left: 1ex;"> <br>There are a few things that can trigger pathological behavior in <br>the markdown parser. <br> <br>One way to find out what is to divide and conquer, converting <br>shorter and shorter segments of your document to see if you can <br>find where things get slow. <br> <br>Another possibility is to use --trace, which will give you <br>very verbose output that will allow you to determine where <br>excessive backtracking is occurring. <br> <br>If you don't need all pandoc extensions, and you're using recen= t <br>pandoc, you might try `-f commonmark_x`, which uses the <br>efficient commonmark parser extended with many (but not all) <br>pandoc extensions. I would expect this to be much faster. <br> <br> <br> <br>Chris Jones <<a href data-email-masked rel=3D"nofollow">cjns...@gmai= l.com</a>> writes: <br> <br>> Six files... ~274,000 words. A pandoc conversion to EPUB last nigh= t took=20 <br>> almost 4 hours. Comparable conversions on the same hardware take a= t most a=20 <br>> couple of minutes. <br>> <br>> How can I investigate & hopefully optimize?=20 <br>> <br>> --=20 <br>> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group. <br>> To unsubscribe from this group and stop receiving emails from it, = send an email to <a href data-email-masked rel=3D"nofollow">pandoc-discus..= .@googlegroups.com</a>. <br>> To view this discussion on the web visit <a href=3D"https://groups= .google.com/d/msgid/pandoc-discuss/af5fe26b-4d84-4dcb-bdcd-6382469c476ao%40= googlegroups.com" target=3D"_blank" rel=3D"nofollow" data-saferedirecturl= =3D"https://www.google.com/url?hl=3Den&q=3Dhttps://groups.google.com/d/= msgid/pandoc-discuss/af5fe26b-4d84-4dcb-bdcd-6382469c476ao%2540googlegroups= .com&source=3Dgmail&ust=3D1603913849423000&usg=3DAFQjCNHaC32HVg= iZWwjHRVzNYcPop3BKTQ">https://groups.google.com/d/msgid/pandoc-discuss/af5f= e26b-4d84-4dcb-bdcd-6382469c476ao%40googlegroups.com</a>. <br></blockquote></div> <p></p> -- <br /> You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.<br /> To unsubscribe from this group and stop receiving emails from it, send an e= mail to <a href=3D"mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org">pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org</a>.<br /> To view this discussion on the web visit <a href=3D"https://groups.google.c= om/d/msgid/pandoc-discuss/22d3d478-357d-464c-b407-aefd2ed81dccn%40googlegro= ups.com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d= /msgid/pandoc-discuss/22d3d478-357d-464c-b407-aefd2ed81dccn%40googlegroups.= com</a>.<br /> ------=_Part_245_769283455.1603835442834-- ------=_Part_244_1495619843.1603835442834--