* pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop @ 2020-10-26 19:22 Chris Jones [not found] ` <af5fe26b-4d84-4dcb-bdcd-6382469c476ao-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Chris Jones @ 2020-10-26 19:22 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 626 bytes --] Six files... ~274,000 words. A pandoc conversion to EPUB last night took almost 4 hours. Comparable conversions on the same hardware take at most a couple of minutes. How can I investigate & hopefully optimize? -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/af5fe26b-4d84-4dcb-bdcd-6382469c476ao%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 973 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <af5fe26b-4d84-4dcb-bdcd-6382469c476ao-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <af5fe26b-4d84-4dcb-bdcd-6382469c476ao-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2020-10-26 21:15 ` John MacFarlane [not found] ` <m2a6w8ofib.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: John MacFarlane @ 2020-10-26 21:15 UTC (permalink / raw) To: Chris Jones, pandoc-discuss There are a few things that can trigger pathological behavior in the markdown parser. One way to find out what is to divide and conquer, converting shorter and shorter segments of your document to see if you can find where things get slow. Another possibility is to use --trace, which will give you very verbose output that will allow you to determine where excessive backtracking is occurring. If you don't need all pandoc extensions, and you're using recent pandoc, you might try `-f commonmark_x`, which uses the efficient commonmark parser extended with many (but not all) pandoc extensions. I would expect this to be much faster. Chris Jones <cjns1989-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > Six files... ~274,000 words. A pandoc conversion to EPUB last night took > almost 4 hours. Comparable conversions on the same hardware take at most a > couple of minutes. > > How can I investigate & hopefully optimize? > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/af5fe26b-4d84-4dcb-bdcd-6382469c476ao%40googlegroups.com. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <m2a6w8ofib.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>]
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <m2a6w8ofib.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> @ 2020-10-27 20:34 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <e9e43a84-9ec5-4732-8dec-e6caac2e59ffn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-27 21:50 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org 1 sibling, 1 reply; 13+ messages in thread From: cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2020-10-27 20:34 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 3879 bytes --] There are a few things that can trigger pathological behavior in the markdown parser. One way to find out what is to divide and conquer, converting shorter and shorter segments of your document to see if you can find where things get slow. I sort of did that. Ran pandoc on each individual chapter... hoping I would find one that took longer than the rest of them. Not much luck with this approach. They all took a long time. Another possibility is to use --trace, which will give you very verbose output that will allow you to determine where excessive backtracking is occurring. It's been stuck for over 10 minutes (!) on this: ] Parsed [RawBlock (Format "tex") "\\begin{center}\n\\textbf{\\old158 at line 4926 The \\old switch allows switching to oldstyle numbers on the fly: \newcommand{\old}{\addfontfeature{Numbers=OldStyle}} Unfortunately the number is truncated (could be 158{0..9} and the line number does not tell me much. I did find a bug in my source in this vicinity (caused by a broken regex) but fixing it makes no difference. Running: pandoc -o epub/test.epub md/title.txt md/md.* --css=css/stylesheet.css --epub-embed-font=fonts/* --trace I'm again stuck in the same exact spot. Ah… it's come unstuck but now it's stuck on something else. Unfortunately I wasn't watching the trace when pandoc started rolling again. If you don't need all pandoc extensions, and you're using recent pandoc, you might try `-f commonmark_x`, which uses the efficient commonmark parser extended with many (but not all) pandoc extensions. I would expect this to be much faster. Sounds good. A quick reminder how I install the "nightly" (hopefully a standalone version… I vaguely remember it's a statically linked program) or where it's documented? Done that in the past but that was over a year ago and I don't remember the finery. On Monday, October 26, 2020 at 5:16:00 PM UTC-4 John MacFarlane wrote: > > There are a few things that can trigger pathological behavior in > the markdown parser. > > One way to find out what is to divide and conquer, converting > shorter and shorter segments of your document to see if you can > find where things get slow. > > Another possibility is to use --trace, which will give you > very verbose output that will allow you to determine where > excessive backtracking is occurring. > > If you don't need all pandoc extensions, and you're using recent > pandoc, you might try `-f commonmark_x`, which uses the > efficient commonmark parser extended with many (but not all) > pandoc extensions. I would expect this to be much faster. > > > > Chris Jones <cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > Six files... ~274,000 words. A pandoc conversion to EPUB last night took > > almost 4 hours. Comparable conversions on the same hardware take at most > a > > couple of minutes. > > > > How can I investigate & hopefully optimize? > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/af5fe26b-4d84-4dcb-bdcd-6382469c476ao%40googlegroups.com > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e9e43a84-9ec5-4732-8dec-e6caac2e59ffn%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 5360 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <e9e43a84-9ec5-4732-8dec-e6caac2e59ffn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <e9e43a84-9ec5-4732-8dec-e6caac2e59ffn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2020-10-27 21:05 ` John MacFarlane 0 siblings, 0 replies; 13+ messages in thread From: John MacFarlane @ 2020-10-27 21:05 UTC (permalink / raw) To: cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, pandoc-discuss "cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <cjns1989-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > There are a few things that can trigger pathological behavior in > the markdown parser. > > One way to find out what is to divide and conquer, converting > shorter and shorter segments of your document to see if you can > find where things get slow. > > I sort of did that. Ran pandoc on each individual chapter... hoping I would > find one that took longer than the rest of them. Not much luck with this > approach. They all took a long time. > > > Another possibility is to use --trace, which will give you > very verbose output that will allow you to determine where > excessive backtracking is occurring. > > It's been stuck for over 10 minutes (!) on this: > > ] Parsed [RawBlock (Format "tex") "\\begin{center}\n\\textbf{\\old158 at > line 4926 This got parsed. So it's stuck on whatever comes after this bit of raw HTML (only the first part is shown, it's the whole center environment presumably). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <m2a6w8ofib.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> 2020-10-27 20:34 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2020-10-27 21:50 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <22d3d478-357d-464c-b407-aefd2ed81dccn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 1 sibling, 1 reply; 13+ messages in thread From: cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2020-10-27 21:50 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 3187 bytes --] With the nightly version (2.11.0.4) /tmp/pandoc -o epub/test.epub md/title.txt md2/ch*.md --css=css/stylesheet.css --epub-embed-font=fonts/* --epub-cover-image=images/cover.png the conversion took seconds. But pandoc complains that, [WARNING] This document format requires a nonempty <title> element. Defaulting to 'title' as the title. To specify a title, use 'title' in metadata or --metadata title="...". And the epubcheck report the following errors probably related to the above warning: ERROR(RSC-005): epub/test.epub/EPUB/content.opf(9,14): Error while parsing file: element "metadata" incomplete; missing required element "dc:title" ERROR(RSC-005): epub/test.epub/EPUB/nav.xhtml(11,134): Error while parsing file: Anchors within nav elements must contain text Check finished with errors Messages: 0 fatal / 2 errors / 0 warnings / 0 info epubcheck completed The title.txt file contains: % URBAIN DUBOIS % La cuisine classique — Volume II It looks as if pandoc is unable to process the content of the title.txt file. When I take a look at the output everything looks good except that the raw latex bits are now included verbatim as if they were part of the text/data. On Monday, October 26, 2020 at 5:16:00 PM UTC-4 John MacFarlane wrote: > > There are a few things that can trigger pathological behavior in > the markdown parser. > > One way to find out what is to divide and conquer, converting > shorter and shorter segments of your document to see if you can > find where things get slow. > > Another possibility is to use --trace, which will give you > very verbose output that will allow you to determine where > excessive backtracking is occurring. > > If you don't need all pandoc extensions, and you're using recent > pandoc, you might try `-f commonmark_x`, which uses the > efficient commonmark parser extended with many (but not all) > pandoc extensions. I would expect this to be much faster. > > > > Chris Jones <cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > Six files... ~274,000 words. A pandoc conversion to EPUB last night took > > almost 4 hours. Comparable conversions on the same hardware take at most > a > > couple of minutes. > > > > How can I investigate & hopefully optimize? > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/af5fe26b-4d84-4dcb-bdcd-6382469c476ao%40googlegroups.com > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/22d3d478-357d-464c-b407-aefd2ed81dccn%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 4578 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <22d3d478-357d-464c-b407-aefd2ed81dccn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <22d3d478-357d-464c-b407-aefd2ed81dccn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2020-10-28 0:28 ` John MacFarlane [not found] ` <m2y2jrurb4.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: John MacFarlane @ 2020-10-28 0:28 UTC (permalink / raw) To: cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, pandoc-discuss "cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <cjns1989-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > With the nightly version (2.11.0.4) > > /tmp/pandoc -o epub/test.epub md/title.txt md2/ch*.md > --css=css/stylesheet.css --epub-embed-font=fonts/* > --epub-cover-image=images/cover.png > > the conversion took seconds. > > But pandoc complains that, > > [WARNING] This document format requires a nonempty <title> element. > Defaulting to 'title' as the title. > To specify a title, use 'title' in metadata or --metadata title="...". > > And the epubcheck report the following errors probably related to the above > warning: > > ERROR(RSC-005): epub/test.epub/EPUB/content.opf(9,14): Error while parsing > file: element "metadata" incomplete; missing required element "dc:title" > ERROR(RSC-005): epub/test.epub/EPUB/nav.xhtml(11,134): Error while parsing > file: Anchors within nav elements must contain text > > Check finished with errors > Messages: 0 fatal / 2 errors / 0 warnings / 0 info > > epubcheck completed > > The title.txt file contains: > > % URBAIN DUBOIS > % La cuisine classique — Volume II Weird. This SHOULD work. Are you seeing anything of this in the resulting epub? (I.e. did it get parsed, but not as metadata? If so, maybe you need a blank line at the end of title.txt.) (Also, I assume your input format is pandoc markdown? commonmark_x doesn't include an extension for this kind of title.) > When I take a look at the output everything looks good except that the raw > latex bits are now included verbatim as if they were part of the text/data. They shouldn't be -- again, is pandoc markdown your input format? Maybe a sample of how these occur in the markdown file? -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2y2jrurb4.fsf%40MacBook-Pro.hsd1.ca.comcast.net. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <m2y2jrurb4.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>]
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <m2y2jrurb4.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> @ 2020-10-28 18:10 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <824220b2-6c2e-4c60-a935-e908f573a3d7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2020-10-28 18:10 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 4032 bytes --] Sorry for the confusion.... copy-pasted the wrong pandoc command. The one I actutally used for this particular run that "took seconds" was: pandoc -o epub/test.epub md/title.txt md/* --css=css/stylesheet.css --epub-embed-font=fonts/* --epub-cover-image=images/cover.png -f commonmark_x And yes I did see (same as the raw latex stuff) the content of the title.txt file verbatim in the output. So basically in my use case this run of pandoc did little more than the cat command and format the output as an EPUB file. I have tons of script/regex-generated of both HTML and LaTeX code in this source so it has to be pandoc.markdown input. The odd thing is that I have been doing this for ages (even Vol. I of this same book which is similar) and never had anything that took ages to compile. Otherwise with nightly and without the "-f commonmark" flag the situation is unchanged. Is there any way I could take a storage dump... backtrace... or something when I kill the hung job? Would some kind of filter that takes some kind of snapshot of the internal state of the process help? Thanks, CJ P.S. I apologize for the messy reports I have sent in lately but I'm having major problems with this particular google group. I had to switch to google chrome (a mess on linux. I normally use firefox) in order to be able to post. And the posts I tried to send from my mail client never made it to the group. I think I mentioned that this is not caused by my local setup since I used someone else's account/machine and it still didn't go through. Any chance someone might look into this at some point? On Tuesday, October 27, 2020 at 8:29:03 PM UTC-4 John MacFarlane wrote: > "cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > With the nightly version (2.11.0.4) > > > > /tmp/pandoc -o epub/test.epub md/title.txt md2/ch*.md > > --css=css/stylesheet.css --epub-embed-font=fonts/* > > --epub-cover-image=images/cover.png > > > > the conversion took seconds. > > > > But pandoc complains that, > > > > [WARNING] This document format requires a nonempty <title> element. > > Defaulting to 'title' as the title. > > To specify a title, use 'title' in metadata or --metadata title="...". > > > > And the epubcheck report the following errors probably related to the > above > > warning: > > > > ERROR(RSC-005): epub/test.epub/EPUB/content.opf(9,14): Error while > parsing > > file: element "metadata" incomplete; missing required element "dc:title" > > ERROR(RSC-005): epub/test.epub/EPUB/nav.xhtml(11,134): Error while > parsing > > file: Anchors within nav elements must contain text > > > > Check finished with errors > > Messages: 0 fatal / 2 errors / 0 warnings / 0 info > > > > epubcheck completed > > > > The title.txt file contains: > > > > % URBAIN DUBOIS > > % La cuisine classique — Volume II > > Weird. This SHOULD work. Are you seeing anything > of this in the resulting epub? (I.e. did it get parsed, > but not as metadata? If so, maybe you need a blank line > at the end of title.txt.) (Also, I assume your input > format is pandoc markdown? commonmark_x doesn't include > an extension for this kind of title.) > > > When I take a look at the output everything looks good except that the > raw > > latex bits are now included verbatim as if they were part of the > text/data. > > They shouldn't be -- again, is pandoc markdown your input format? > Maybe a sample of how these occur in the markdown file? > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/824220b2-6c2e-4c60-a935-e908f573a3d7n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 5063 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <824220b2-6c2e-4c60-a935-e908f573a3d7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <824220b2-6c2e-4c60-a935-e908f573a3d7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2020-10-29 0:04 ` John MacFarlane [not found] ` <m28sbpucc4.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> 2020-10-30 10:21 ` BPJ 2020-10-30 16:49 ` John MacFarlane 2 siblings, 1 reply; 13+ messages in thread From: John MacFarlane @ 2020-10-29 0:04 UTC (permalink / raw) To: cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, pandoc-discuss As I mentioned, --trace is the way to get an internal snap shot of parsing -- at least at the block level. It sounds as if that did tell you where the parser is getting stuck (it would be AFTER the last traced block). Putting raw tex blocks inside ```{=latex} ... ``` (the raw attribute syntax) will help the parser in tricky cases, so you might try that. "cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <cjns1989-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > Sorry for the confusion.... copy-pasted the wrong pandoc command. The one I > actutally used for this particular run that "took seconds" was: > > pandoc -o epub/test.epub md/title.txt md/* --css=css/stylesheet.css > --epub-embed-font=fonts/* --epub-cover-image=images/cover.png -f > commonmark_x > > And yes I did see (same as the raw latex stuff) the content of the > title.txt file verbatim in the output. > > So basically in my use case this run of pandoc did little more than the > cat command and format the output as an EPUB file. > > I have tons of script/regex-generated of both HTML and LaTeX code in this > source so it has to be pandoc.markdown input. > > The odd thing is that I have been doing this for ages (even Vol. I of this > same book which is similar) and never had anything that took ages to > compile. > > Otherwise with nightly and without the "-f commonmark" flag the situation > is unchanged. > > Is there any way I could take a storage dump... backtrace... or something > when I kill the hung job? > > Would some kind of filter that takes some kind of snapshot of the internal > state of the process help? > > Thanks, > > CJ > > P.S. I apologize for the messy reports I have sent in lately but I'm having > major problems with this particular google group. I had to switch to google > chrome (a mess on linux. I normally use firefox) in order to be able to > post. And the posts I tried to send from my mail client never made it to > the group. I think I mentioned that this is not caused by my local setup > since I used someone else's account/machine and it still didn't go through. > Any chance someone might look into this at some point? > > On Tuesday, October 27, 2020 at 8:29:03 PM UTC-4 John MacFarlane wrote: > >> "cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >> >> > With the nightly version (2.11.0.4) >> > >> > /tmp/pandoc -o epub/test.epub md/title.txt md2/ch*.md >> > --css=css/stylesheet.css --epub-embed-font=fonts/* >> > --epub-cover-image=images/cover.png >> > >> > the conversion took seconds. >> > >> > But pandoc complains that, >> > >> > [WARNING] This document format requires a nonempty <title> element. >> > Defaulting to 'title' as the title. >> > To specify a title, use 'title' in metadata or --metadata title="...". >> > >> > And the epubcheck report the following errors probably related to the >> above >> > warning: >> > >> > ERROR(RSC-005): epub/test.epub/EPUB/content.opf(9,14): Error while >> parsing >> > file: element "metadata" incomplete; missing required element "dc:title" >> > ERROR(RSC-005): epub/test.epub/EPUB/nav.xhtml(11,134): Error while >> parsing >> > file: Anchors within nav elements must contain text >> > >> > Check finished with errors >> > Messages: 0 fatal / 2 errors / 0 warnings / 0 info >> > >> > epubcheck completed >> > >> > The title.txt file contains: >> > >> > % URBAIN DUBOIS >> > % La cuisine classique — Volume II >> >> Weird. This SHOULD work. Are you seeing anything >> of this in the resulting epub? (I.e. did it get parsed, >> but not as metadata? If so, maybe you need a blank line >> at the end of title.txt.) (Also, I assume your input >> format is pandoc markdown? commonmark_x doesn't include >> an extension for this kind of title.) >> >> > When I take a look at the output everything looks good except that the >> raw >> > latex bits are now included verbatim as if they were part of the >> text/data. >> >> They shouldn't be -- again, is pandoc markdown your input format? >> Maybe a sample of how these occur in the markdown file? >> >> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/824220b2-6c2e-4c60-a935-e908f573a3d7n%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m28sbpucc4.fsf%40MacBook-Pro.hsd1.ca.comcast.net. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <m28sbpucc4.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>]
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <m28sbpucc4.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> @ 2020-10-29 23:35 ` Chris Jones 0 siblings, 0 replies; 13+ messages in thread From: Chris Jones @ 2020-10-29 23:35 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 6186 bytes --] With the raw latex explicitly identified/marked as such as recommended above the compilation takes minutes instead of hours. To add to my embarrassment over this difficulty I now remember that you told me not long ago to do this when pandoc goes postal. I guess I was too focused on the fact I was creating an EPUB not a latex/pdf document to remember this piece of advice. After adding hundreds such ```{=latex} tags the code does not look any cleaner but it definitly addresses the problem. As to the generation of a pdf off of the same source it takes quite a long time but nothing out of the ordinary. Thank you for your patience On Wednesday, October 28, 2020 at 8:04:44 PM UTC-4, John MacFarlane wrote: > > > As I mentioned, --trace is the way to get an internal snap shot > of parsing -- at least at the block level. It sounds as if > that did tell you where the parser is getting stuck (it would > be AFTER the last traced block). > > Putting raw tex blocks inside > > ```{=latex} > ... > ``` > > (the raw attribute syntax) will help the parser in tricky cases, > so you might try that. > > "cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: > > > Sorry for the confusion.... copy-pasted the wrong pandoc command. The > one I > > actutally used for this particular run that "took seconds" was: > > > > pandoc -o epub/test.epub md/title.txt md/* --css=css/stylesheet.css > > --epub-embed-font=fonts/* --epub-cover-image=images/cover.png -f > > commonmark_x > > > > And yes I did see (same as the raw latex stuff) the content of the > > title.txt file verbatim in the output. > > > > So basically in my use case this run of pandoc did little more than the > > cat command and format the output as an EPUB file. > > > > I have tons of script/regex-generated of both HTML and LaTeX code in > this > > source so it has to be pandoc.markdown input. > > > > The odd thing is that I have been doing this for ages (even Vol. I of > this > > same book which is similar) and never had anything that took ages to > > compile. > > > > Otherwise with nightly and without the "-f commonmark" flag the > situation > > is unchanged. > > > > Is there any way I could take a storage dump... backtrace... or > something > > when I kill the hung job? > > > > Would some kind of filter that takes some kind of snapshot of the > internal > > state of the process help? > > > > Thanks, > > > > CJ > > > > P.S. I apologize for the messy reports I have sent in lately but I'm > having > > major problems with this particular google group. I had to switch to > google > > chrome (a mess on linux. I normally use firefox) in order to be able to > > post. And the posts I tried to send from my mail client never made it to > > the group. I think I mentioned that this is not caused by my local setup > > since I used someone else's account/machine and it still didn't go > through. > > Any chance someone might look into this at some point? > > > > On Tuesday, October 27, 2020 at 8:29:03 PM UTC-4 John MacFarlane wrote: > > > >> "cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > >> > >> > With the nightly version (2.11.0.4) > >> > > >> > /tmp/pandoc -o epub/test.epub md/title.txt md2/ch*.md > >> > --css=css/stylesheet.css --epub-embed-font=fonts/* > >> > --epub-cover-image=images/cover.png > >> > > >> > the conversion took seconds. > >> > > >> > But pandoc complains that, > >> > > >> > [WARNING] This document format requires a nonempty <title> element. > >> > Defaulting to 'title' as the title. > >> > To specify a title, use 'title' in metadata or --metadata > title="...". > >> > > >> > And the epubcheck report the following errors probably related to the > >> above > >> > warning: > >> > > >> > ERROR(RSC-005): epub/test.epub/EPUB/content.opf(9,14): Error while > >> parsing > >> > file: element "metadata" incomplete; missing required element > "dc:title" > >> > ERROR(RSC-005): epub/test.epub/EPUB/nav.xhtml(11,134): Error while > >> parsing > >> > file: Anchors within nav elements must contain text > >> > > >> > Check finished with errors > >> > Messages: 0 fatal / 2 errors / 0 warnings / 0 info > >> > > >> > epubcheck completed > >> > > >> > The title.txt file contains: > >> > > >> > % URBAIN DUBOIS > >> > % La cuisine classique — Volume II > >> > >> Weird. This SHOULD work. Are you seeing anything > >> of this in the resulting epub? (I.e. did it get parsed, > >> but not as metadata? If so, maybe you need a blank line > >> at the end of title.txt.) (Also, I assume your input > >> format is pandoc markdown? commonmark_x doesn't include > >> an extension for this kind of title.) > >> > >> > When I take a look at the output everything looks good except that > the > >> raw > >> > latex bits are now included verbatim as if they were part of the > >> text/data. > >> > >> They shouldn't be -- again, is pandoc markdown your input format? > >> Maybe a sample of how these occur in the markdown file? > >> > >> > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/824220b2-6c2e-4c60-a935-e908f573a3d7n%40googlegroups.com. > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ecd86bee-8049-471a-a97b-a7be98e08c46o%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 8308 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <824220b2-6c2e-4c60-a935-e908f573a3d7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-29 0:04 ` John MacFarlane @ 2020-10-30 10:21 ` BPJ 2020-10-30 16:49 ` John MacFarlane 2 siblings, 0 replies; 13+ messages in thread From: BPJ @ 2020-10-30 10:21 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1: Type: text/plain, Size: 1596 bytes --] Den ons 28 okt. 2020 19:11cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <cjns1989-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev: P.S. I apologize for the messy reports I have sent in lately but I'm having > major problems with this particular google group. I had to switch to google > chrome (a mess on linux. I normally use firefox) > How so? The switching as such or using Chrome? I have all of FF, Chrome and Chromium installed on my system and just use whichever I want; usually Chrome. in order to be able to post. And the posts I tried to send from my mail > client never made it to the group. I think I mentioned that this is not > caused by my local setup since I used someone else's account/machine and it > still didn't go through. Any chance someone might look into this at some > point? > Have you tried reading Google Groups in your email client? I have done so for years without a hitch — so long ago that I unfortunately don't know anymore what setting you have to (de)activate. I *think* that GG messages automatically go to the main email address associated with your Google account unless you deactivate it. /bpj -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBW87Lr4PzaaL42zNBCwXD%3Ds5svkYd4zxFmxsjs3df3rQ%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 2675 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <824220b2-6c2e-4c60-a935-e908f573a3d7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-29 0:04 ` John MacFarlane 2020-10-30 10:21 ` BPJ @ 2020-10-30 16:49 ` John MacFarlane [not found] ` <m2zh43psk7.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> 2 siblings, 1 reply; 13+ messages in thread From: John MacFarlane @ 2020-10-30 16:49 UTC (permalink / raw) To: cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, pandoc-discuss > P.S. I apologize for the messy reports I have sent in lately but I'm having > major problems with this particular google group. I had to switch to google > chrome (a mess on linux. I normally use firefox) in order to be able to > post. And the posts I tried to send from my mail client never made it to > the group. I think I mentioned that this is not caused by my local setup > since I used someone else's account/machine and it still didn't go through. > Any chance someone might look into this at some point? Google's spam filter is sometimes over-aggressive. I've just gone in and approved some pending messages, so maybe that fixes the problem! ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <m2zh43psk7.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>]
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <m2zh43psk7.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> @ 2020-10-30 22:03 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <20201030220312.GD5998-611mE6nXTcHDOqzlkpFKJg@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2020-10-30 22:03 UTC (permalink / raw) To: pandoc-discuss On Fri, Oct 30, 2020 at 12:49:44PM EDT, John MacFarlane wrote: > > > P.S. I apologize for the messy reports I have sent in lately but I'm having > > major problems with this particular google group. I had to switch to google > > chrome (a mess on linux. I normally use firefox) in order to be able to > > post. And the posts I tried to send from my mail client never made it to > > the group. I think I mentioned that this is not caused by my local setup > > since I used someone else's account/machine and it still didn't go through. > > Any chance someone might look into this at some point? > > Google's spam filter is sometimes over-aggressive. > I've just gone in and approved some pending messages, so > maybe that fixes the problem! Much appreciated. Thanks, CJ ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20201030220312.GD5998-611mE6nXTcHDOqzlkpFKJg@public.gmane.org>]
* Re: pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop [not found] ` <20201030220312.GD5998-611mE6nXTcHDOqzlkpFKJg@public.gmane.org> @ 2020-10-30 22:58 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org 0 siblings, 0 replies; 13+ messages in thread From: cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2020-10-30 22:58 UTC (permalink / raw) To: pandoc-discuss On Fri, Oct 30, 2020 at 06:03:12PM EDT, cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote: > On Fri, Oct 30, 2020 at 12:49:44PM EDT, John MacFarlane wrote: > > > > Google's spam filter is sometimes over-aggressive. > > I've just gone in and approved some pending messages, so > > maybe that fixes the problem! > > Much appreciated. > > Thanks, Seems to have done the trick. The above reply to your message got through and just came back to my mail reader. Thanks, CJ ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-10-30 22:58 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-26 19:22 pandoc.markdown to epub conversion took just under 4 hours on an average linux laptop Chris Jones [not found] ` <af5fe26b-4d84-4dcb-bdcd-6382469c476ao-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-26 21:15 ` John MacFarlane [not found] ` <m2a6w8ofib.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> 2020-10-27 20:34 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <e9e43a84-9ec5-4732-8dec-e6caac2e59ffn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-27 21:05 ` John MacFarlane 2020-10-27 21:50 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <22d3d478-357d-464c-b407-aefd2ed81dccn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-28 0:28 ` John MacFarlane [not found] ` <m2y2jrurb4.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> 2020-10-28 18:10 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <824220b2-6c2e-4c60-a935-e908f573a3d7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-29 0:04 ` John MacFarlane [not found] ` <m28sbpucc4.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> 2020-10-29 23:35 ` Chris Jones 2020-10-30 10:21 ` BPJ 2020-10-30 16:49 ` John MacFarlane [not found] ` <m2zh43psk7.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org> 2020-10-30 22:03 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <20201030220312.GD5998-611mE6nXTcHDOqzlkpFKJg@public.gmane.org> 2020-10-30 22:58 ` cjns...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).