From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31048 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ismail Jattioui Newsgroups: gmane.text.pandoc Subject: Re: Move TOC when converting html to docx Date: Thu, 21 Jul 2022 06:48:00 -0700 (PDT) Message-ID: <464972bd-888a-4717-b668-51f0b6a13cd9n@googlegroups.com> References: <77066946-d07a-489a-9ec2-99796422f682n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_660_371318782.1658411280184" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27443"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDH3TLP3SIARBENS4WLAMGQEGME2N5A-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Jul 21 15:48:05 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-yw1-f183.google.com ([209.85.128.183]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oEWX7-0006vx-0D for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 21 Jul 2022 15:48:05 +0200 Original-Received: by mail-yw1-f183.google.com with SMTP id 00721157ae682-31e6ffb03f6sf15154627b3.2 for ; Thu, 21 Jul 2022 06:48:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=GoWbdpwUixdee0B6mYn163YeXVcbZZ/48KHMfaS7TlU=; b=P8+jwlLavBYoviSF5c9uTa5hfh4ka4ln8zhadVVyY1431pbt8CSlsRFzQSf7maDQb/ uUnxuEma9IKIfE6TRzzxh2iji0onRa614tpKmfuaGLHY+cEWtqI/3/f1OhTEVwG49z4p tiTqYqcvUrbOw+7sd/Y6Jf6Is9fE/0QKafbEHVmAjStRxnXi5EQZIXNcLKtGH0vPAMQa ZUAkWaM/7a/SSk05l//PXmfwY36NVdoXlbswpUSzOkeoZs3kp1gazTjU7Quc5dT44hmb sWkuLOvSDV0xoVUjjCVH3D07YXqDapA7EmzsvGkgFv4H6Tlex0o/QylnuBklpvi6D5HV 4ayQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=GoWbdpwUixdee0B6mYn163YeXVcbZZ/48KHMfaS7TlU=; b=f46mtjIpEx3TTcy4rySyQkJWp8WBv8pfAqiPQoeT4ZYrPQHlax/hmZ/GUljs5Q0kVU aM0l3c7cIfei3PDU33Z0qqomf6PgJNsEFeAcruJ6gHERSfMR0Y7JIvRvEPs0+9oqIfh8 HXAGHc6koOFe6EHlbFJYiegi9m+9SPM5d1wPUAnkwHjQ1hjpjI9BQU5XMdzCVQZScH16 OYznFEGCvdGobOe8/XpYcRyq6HXGoK+dBf3XEhpeKzh24hzdeiHd8s8uX2zEqnpcsqXW EfHAWlNCz7kCGlAH/wwyukPet4VdNmu9sD/0hLKvNZfyH1m/m+8KyQSGP2vu9Ph4+4r6 iy8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=GoWbdpwUixdee0B6mYn163YeXVcbZZ/48KHMfaS7TlU=; b=ih5h8qO0qhYVrANbj9R+eFm5UxY82nhL3xyBmw18X1qzoAfbZuy0NsTkz0vAUg9Xvn hQXiUBc8qHbq9tJYCJT6vaA6CFnKHoAAwgBpa3DtvpmxaCXtGRes2133AUzdFKm13cL6 aKQHzt2YUR6smst3hwd+ZAw2MKgz3yj0e46GCjKOuEQc8sAoet9GUBxYibTI2yjsBUO0 R9YekrwL9LIpBGnjdWdGFVsmB1qh7lT37nRMYs43vvRY857g1zyLlkT/2rYFsLvOBNBR AUZtOObFXZ4Kydib5ZY86ycaLi5sXE4+kvYZFdQl9gb1PH3bPKxhHje1S7cX/w0MSfVQ 7sUA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AJIora+wmU/iMZANfhfhMcCaKVMBND0nFBL0X2FxDZ8zetCLNVocxmid Fa2ZFXcCvA8/+xE+NjWAF/M= X-Google-Smtp-Source: AGRyM1t6fyMT5hqwQQBR8Ax6ws2VvgBpgT2bH6RLvx+8XEXOr9fC6DHuTTtspmIBDLgoX+UxC8e/XQ== X-Received: by 2002:a81:4f41:0:b0:31e:7981:3a64 with SMTP id d62-20020a814f41000000b0031e79813a64mr4670470ywb.93.1658411283653; Thu, 21 Jul 2022 06:48:03 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:690c:39e:b0:2fe:dfec:f7d8 with SMTP id bh30-20020a05690c039e00b002fedfecf7d8ls1492889ywb.5.-pod-prod-gmail; Thu, 21 Jul 2022 06:48:01 -0700 (PDT) X-Received: by 2002:a81:8d08:0:b0:317:a4cd:d65d with SMTP id d8-20020a818d08000000b00317a4cdd65dmr45122589ywg.329.1658411280775; Thu, 21 Jul 2022 06:48:00 -0700 (PDT) In-Reply-To: X-Original-Sender: ismail.jattioui1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31048 Archived-At: ------=_Part_660_371318782.1658411280184 Content-Type: multipart/alternative; boundary="----=_Part_661_846928461.1658411280184" ------=_Part_661_846928461.1658411280184 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you so much it works ! Here is a boilerplate solution for someone else who would try it in=20 javascript using JSZip library (The advantage using this library is that=20 you won't have to extract all files into the disk in order to process them) https://gist.github.com/jaxalo/bd23a8db85ddc7afc5c9ca668b13c898 Le lundi 18 juillet 2022 =C3=A0 10:07:07 UTC+2, fiddlosopher a =C3=A9crit : > There's a special syntax in the docx file to include the table of=20 > contents; you're not going to be able to do it this way. > > Maybe your best approach would be to have a script modify the docx after= =20 > pandoc produces it. A docx is just a zip file containing xml documnets, s= o=20 > you'd need to unzip it, modify document.xml, and zip it back up. The=20 > modification would simply consist of moving the XML elements that produce= =20 > the TOC to another location in your document.xml. > > > On Jul 11, 2022, at 10:48 AM, Ismail Jattioui = =20 > wrote: > >=20 > > Hi, > >=20 > > I am trying to convert a html file to docx using pandoc. My problem is= =20 > that I can=E2=80=99t manage to move the table of contents to a specific p= osition in=20 > the document. I tried splitting my document into two, then merging it aga= in=20 > but it isn=E2=80=99t optimal since we are using it in production and it c= osts us 2=20 > calls to pandoc and it isn't very maintanable > >=20 > > I was wondering if there is a way to do that using Lua filters > >=20 > > In a nutshell, let=E2=80=99s say I have the following html document tha= t I wish=20 > to convert to DOCX : > >=20 > > > > > > > > > > > >

Title 1

> >

Some stuff 2

> >

Subtitle 1

> >

Some stuff 2

> >
Other things
> >
Insert TOC below
> > > >=20 > > How do I manage to generate a Table of content below the div with the= =20 > TOC id, without splitting the document ? > >=20 > > Thanks in advance > >=20 > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/77066946-d07a-489a-9ec2-= 99796422f682n%40googlegroups.com > . > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/464972bd-888a-4717-b668-51f0b6a13cd9n%40googlegroups.com. ------=_Part_661_846928461.1658411280184 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you so much it works !

Here is a boilerplate solution for some= one else who would try it in javascript using JSZip library (The advantage = using this library is that you won't have to extract all files into the dis= k in order to process them)

https://gist.gith= ub.com/jaxalo/bd23a8db85ddc7afc5c9ca668b13c898

Le lundi 18 juillet 2022 =C3= =A0 10:07:07 UTC+2, fiddlosopher a =C3=A9crit=C2=A0:
There's a special syntax in the= docx file to include the table of contents; you're not going to be abl= e to do it this way.

Maybe your best approach would be to have a script modify the docx afte= r pandoc produces it. A docx is just a zip file containing xml documnets, s= o you'd need to unzip it, modify document.xml, and zip it back up. The= modification would simply consist of moving the XML elements that produce = the TOC to another location in your document.xml.

> On Jul 11, 2022, at 10:48 AM, Ismail Jattioui <ismail.j...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>=20
> Hi,
>=20
> I am trying to convert a html file to docx using pandoc. My proble= m is that I can=E2=80=99t manage to move the table of contents to a specifi= c position in the document. I tried splitting my document into two, then me= rging it again but it isn=E2=80=99t optimal since we are using it in produc= tion and it costs us 2 calls to pandoc and it isn't very maintanable
>=20
> I was wondering if there is a way to do that using Lua filters
>=20
> In a nutshell, let=E2=80=99s say I have the following html documen= t that I wish to convert to DOCX :
>=20
> <!DOCTYPE html>
> <html lang=3D"en">
> <head>
> <meta charset=3D"UTF-8" />
> </head>
> <h1>Title 1</h1>
> <p>Some stuff 2</p>
> <h2>Subtitle 1</h2>
> <p>Some stuff 2</p>
> <div>Other things</div>
> <div id=3D"TOC">Insert TOC below</div>
> </html>
>=20
> How do I manage to generate a Table of content below the div with = the TOC id, without splitting the document ?
>=20
> Thanks in advance
>=20
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-discus..= .@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/77066946-d= 07a-489a-9ec2-99796422f682n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/464972bd-888a-4717-b668-51f0b6a13cd9n%40googlegroups.= com.
------=_Part_661_846928461.1658411280184-- ------=_Part_660_371318782.1658411280184--