* Move TOC when converting html to docx @ 2022-07-11 8:48 Ismail Jattioui [not found] ` <77066946-d07a-489a-9ec2-99796422f682n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Ismail Jattioui @ 2022-07-11 8:48 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1356 bytes --] Hi, I am trying to convert a html file to docx using pandoc. My problem is that I can’t manage to move the table of contents to a specific position in the document. I tried splitting my document into two, then merging it again but it isn’t optimal since we are using it in production and it costs us 2 calls to pandoc and it isn't very maintanable I was wondering if there is a way to do that using Lua filters In a nutshell, let’s say I have the following html document that I wish to convert to DOCX : <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> </head> <h1>Title 1</h1> <p>Some stuff 2</p> <h2>Subtitle 1</h2> <p>Some stuff 2</p> <div>Other things</div> <div id="TOC">Insert TOC below</div> </html> How do I manage to generate a Table of content below the div with the TOC id, without splitting the document ? Thanks in advance -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/77066946-d07a-489a-9ec2-99796422f682n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 2545 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <77066946-d07a-489a-9ec2-99796422f682n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Move TOC when converting html to docx [not found] ` <77066946-d07a-489a-9ec2-99796422f682n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2022-07-12 14:32 ` Ismail Jattioui [not found] ` <88926968-1ca3-40c4-944f-c78e0554ba84n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-07-18 8:07 ` John MacFarlane 1 sibling, 1 reply; 5+ messages in thread From: Ismail Jattioui @ 2022-07-12 14:32 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 3349 bytes --] I tried this code which looked like what I want to do, but it still doesn’t work unfortunately. There are apparently no RawBlock in the html I posted and I don't see how we can add one I tried using Para and Block with no success :/ I got the following error : PandocLuaError "Trying to set unavailable property text." at the line indicated by ----> The command I am using: pandoc --metadata toc-title=custom-toc --lua-filter=filter.lua input-test.html -o res.docx The luaFilter I am trying: ------------------------------------------------------ local RAW_TOC = [[ <w:sdt> <w:sdtContent xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> <w:p> <w:r> <w:fldChar w:fldCharType="begin" w:dirty="true" /> <w:instrText xml:space="preserve">TOC \o "1-3" \h \z \u</w:instrText> <w:fldChar w:fldCharType="separate" /> <w:fldChar w:fldCharType="end" /> </w:r> </w:p> </w:sdtContent> </w:sdt> ]] local meta_key = "toc-title" local vars = {} local function getVars (meta) for k, v in pairs(meta) do if v.t == 'MetaInlines' then print('isMetaInlines') vars["$" .. k .. "$"] = { table.unpack(v) } end end end local function pageBreak(el) if el.text == "pandoc-page-break" then print('pageBreak') return pandoc.Str "" else return el end end local function toc(el) print(el) if pandoc.utils.stringify(el) == "pandoc-toc" then ----> el.text = RAW_TOC el.format = "openxml" local para = pandoc.Para(vars) local div = pandoc.Div({ para, el }) div["attr"]["attributes"]["custom-style"] = "TOC Heading" return div end end return { { Meta = getVars }, { Str = pageBreak }, { RawBlock = toc } } ------------------------------------------------------ Le lundi 11 juillet 2022 à 10:48:41 UTC+2, Ismail Jattioui a écrit : > Hi, > > I am trying to convert a html file to docx using pandoc. My problem is > that I can’t manage to move the table of contents to a specific position in > the document. I tried splitting my document into two, then merging it again > but it isn’t optimal since we are using it in production and it costs us 2 > calls to pandoc and it isn't very maintanable > > I was wondering if there is a way to do that using Lua filters > > In a nutshell, let’s say I have the following html document that I wish to > convert to DOCX : > > <!DOCTYPE html> > <html lang="en"> > <head> > <meta charset="UTF-8" /> > </head> > <h1>Title 1</h1> > <p>Some stuff 2</p> > <h2>Subtitle 1</h2> > <p>Some stuff 2</p> > <div>Other things</div> > <div id="TOC">Insert TOC below</div> > </html> > > How do I manage to generate a Table of content below the div with the TOC > id, without splitting the document ? > > Thanks in advance > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/88926968-1ca3-40c4-944f-c78e0554ba84n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 6921 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <88926968-1ca3-40c4-944f-c78e0554ba84n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Move TOC when converting html to docx [not found] ` <88926968-1ca3-40c4-944f-c78e0554ba84n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2022-07-18 6:33 ` Ismail Jattioui 0 siblings, 0 replies; 5+ messages in thread From: Ismail Jattioui @ 2022-07-18 6:33 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 3614 bytes --] up please Le mardi 12 juillet 2022 à 16:32:43 UTC+2, Ismail Jattioui a écrit : > I tried this code which looked like what I want to do, but it still > doesn’t work unfortunately. > > There are apparently no RawBlock in the html I posted and I don't see how > we can add one > > I tried using Para and Block with no success :/ I got the following error > : > PandocLuaError "Trying to set unavailable property text." at the line > indicated by ----> > > The command I am using: > > pandoc --metadata toc-title=custom-toc --lua-filter=filter.lua > input-test.html -o res.docx > > The luaFilter I am trying: > > ------------------------------------------------------ > local RAW_TOC = [[ > <w:sdt> > <w:sdtContent xmlns:w=" > http://schemas.openxmlformats.org/wordprocessingml/2006/main"> > <w:p> > <w:r> > <w:fldChar w:fldCharType="begin" w:dirty="true" /> > <w:instrText xml:space="preserve">TOC \o "1-3" \h \z \u</w:instrText> > <w:fldChar w:fldCharType="separate" /> > <w:fldChar w:fldCharType="end" /> > </w:r> > </w:p> > </w:sdtContent> > </w:sdt> > ]] > local meta_key = "toc-title" > local vars = {} > > > local function getVars (meta) > for k, v in pairs(meta) do > if v.t == 'MetaInlines' then > print('isMetaInlines') > vars["$" .. k .. "$"] = { table.unpack(v) } > end > end > end > > local function pageBreak(el) > if el.text == "pandoc-page-break" then > print('pageBreak') > return pandoc.Str "" > else > return el > end > end > > > local function toc(el) > print(el) > if pandoc.utils.stringify(el) == "pandoc-toc" then > ----> el.text = RAW_TOC > el.format = "openxml" > local para = pandoc.Para(vars) > local div = pandoc.Div({ para, el }) > div["attr"]["attributes"]["custom-style"] = "TOC Heading" > return div > end > end > > return { > { Meta = getVars }, > { Str = pageBreak }, > { RawBlock = toc } > } > ------------------------------------------------------ > Le lundi 11 juillet 2022 à 10:48:41 UTC+2, Ismail Jattioui a écrit : > >> Hi, >> >> I am trying to convert a html file to docx using pandoc. My problem is >> that I can’t manage to move the table of contents to a specific position in >> the document. I tried splitting my document into two, then merging it again >> but it isn’t optimal since we are using it in production and it costs us 2 >> calls to pandoc and it isn't very maintanable >> >> I was wondering if there is a way to do that using Lua filters >> >> In a nutshell, let’s say I have the following html document that I wish >> to convert to DOCX : >> >> <!DOCTYPE html> >> <html lang="en"> >> <head> >> <meta charset="UTF-8" /> >> </head> >> <h1>Title 1</h1> >> <p>Some stuff 2</p> >> <h2>Subtitle 1</h2> >> <p>Some stuff 2</p> >> <div>Other things</div> >> <div id="TOC">Insert TOC below</div> >> </html> >> >> How do I manage to generate a Table of content below the div with the TOC >> id, without splitting the document ? >> >> Thanks in advance >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a9967f45-314e-484c-a642-ecb03c315e10n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 7364 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Move TOC when converting html to docx [not found] ` <77066946-d07a-489a-9ec2-99796422f682n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-07-12 14:32 ` Ismail Jattioui @ 2022-07-18 8:07 ` John MacFarlane [not found] ` <EE47F68F-93F4-41CF-B650-7B1E1613D00E-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 5+ messages in thread From: John MacFarlane @ 2022-07-18 8:07 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw There's a special syntax in the docx file to include the table of contents; you're not going to be able to do it this way. Maybe your best approach would be to have a script modify the docx after pandoc produces it. A docx is just a zip file containing xml documnets, so you'd need to unzip it, modify document.xml, and zip it back up. The modification would simply consist of moving the XML elements that produce the TOC to another location in your document.xml. > On Jul 11, 2022, at 10:48 AM, Ismail Jattioui <ismail.jattioui1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > Hi, > > I am trying to convert a html file to docx using pandoc. My problem is that I can’t manage to move the table of contents to a specific position in the document. I tried splitting my document into two, then merging it again but it isn’t optimal since we are using it in production and it costs us 2 calls to pandoc and it isn't very maintanable > > I was wondering if there is a way to do that using Lua filters > > In a nutshell, let’s say I have the following html document that I wish to convert to DOCX : > > <!DOCTYPE html> > <html lang="en"> > <head> > <meta charset="UTF-8" /> > </head> > <h1>Title 1</h1> > <p>Some stuff 2</p> > <h2>Subtitle 1</h2> > <p>Some stuff 2</p> > <div>Other things</div> > <div id="TOC">Insert TOC below</div> > </html> > > How do I manage to generate a Table of content below the div with the TOC id, without splitting the document ? > > Thanks in advance > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/77066946-d07a-489a-9ec2-99796422f682n%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/EE47F68F-93F4-41CF-B650-7B1E1613D00E%40gmail.com. ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <EE47F68F-93F4-41CF-B650-7B1E1613D00E-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Move TOC when converting html to docx [not found] ` <EE47F68F-93F4-41CF-B650-7B1E1613D00E-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2022-07-21 13:48 ` Ismail Jattioui 0 siblings, 0 replies; 5+ messages in thread From: Ismail Jattioui @ 2022-07-21 13:48 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 2882 bytes --] Thank you so much it works ! Here is a boilerplate solution for someone else who would try it in javascript using JSZip library (The advantage using this library is that you won't have to extract all files into the disk in order to process them) https://gist.github.com/jaxalo/bd23a8db85ddc7afc5c9ca668b13c898 Le lundi 18 juillet 2022 à 10:07:07 UTC+2, fiddlosopher a écrit : > There's a special syntax in the docx file to include the table of > contents; you're not going to be able to do it this way. > > Maybe your best approach would be to have a script modify the docx after > pandoc produces it. A docx is just a zip file containing xml documnets, so > you'd need to unzip it, modify document.xml, and zip it back up. The > modification would simply consist of moving the XML elements that produce > the TOC to another location in your document.xml. > > > On Jul 11, 2022, at 10:48 AM, Ismail Jattioui <ismail.j...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > wrote: > > > > Hi, > > > > I am trying to convert a html file to docx using pandoc. My problem is > that I can’t manage to move the table of contents to a specific position in > the document. I tried splitting my document into two, then merging it again > but it isn’t optimal since we are using it in production and it costs us 2 > calls to pandoc and it isn't very maintanable > > > > I was wondering if there is a way to do that using Lua filters > > > > In a nutshell, let’s say I have the following html document that I wish > to convert to DOCX : > > > > <!DOCTYPE html> > > <html lang="en"> > > <head> > > <meta charset="UTF-8" /> > > </head> > > <h1>Title 1</h1> > > <p>Some stuff 2</p> > > <h2>Subtitle 1</h2> > > <p>Some stuff 2</p> > > <div>Other things</div> > > <div id="TOC">Insert TOC below</div> > > </html> > > > > How do I manage to generate a Table of content below the div with the > TOC id, without splitting the document ? > > > > Thanks in advance > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/77066946-d07a-489a-9ec2-99796422f682n%40googlegroups.com > . > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/464972bd-888a-4717-b668-51f0b6a13cd9n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 4325 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-07-21 13:48 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-11 8:48 Move TOC when converting html to docx Ismail Jattioui [not found] ` <77066946-d07a-489a-9ec2-99796422f682n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-07-12 14:32 ` Ismail Jattioui [not found] ` <88926968-1ca3-40c4-944f-c78e0554ba84n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-07-18 6:33 ` Ismail Jattioui 2022-07-18 8:07 ` John MacFarlane [not found] ` <EE47F68F-93F4-41CF-B650-7B1E1613D00E-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2022-07-21 13:48 ` Ismail Jattioui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).