From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/30951 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ismail Jattioui Newsgroups: gmane.text.pandoc Subject: Re: Move TOC when converting html to docx Date: Tue, 12 Jul 2022 07:32:43 -0700 (PDT) Message-ID: <88926968-1ca3-40c4-944f-c78e0554ba84n@googlegroups.com> References: <77066946-d07a-489a-9ec2-99796422f682n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1844_1987978746.1657636363475" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17265"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDH3TLP3SIARBDEMW2LAMGQENMSB6EQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Jul 12 16:32:48 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-yw1-f185.google.com ([209.85.128.185]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oBGwS-0004Dx-7b for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 12 Jul 2022 16:32:48 +0200 Original-Received: by mail-yw1-f185.google.com with SMTP id 00721157ae682-31cbe6ad44fsf71738427b3.10 for ; Tue, 12 Jul 2022 07:32:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=QjSsa+3VLqxdr/IzmR0vMJdoz0AiD1dBs4HC8t27aKU=; b=RMQaY+Iau3LmbugqFKsai8aSOdRZkNQ6WXXW0RCHvYwtjBFwTvKQDyzSoN+K5sFs8P a04SxAM7QLd6nwmC2c206ZtKYC27L29vKbF9qgnYwLXr53F+A+cqqOdLQdxZNnkdKQoR 6yCuSBtOXIk8WgHg0u+KI7NDKEuVhBroI7HQvPcsFV6pB522QmqgVJ1kyaFzFmzXzayv mCH88O5ligaG6xPyqhOxvhVHBGmeCD7xKZ5hT7lUjOedsUJ9dWtPbsAhPSvbeHJAaSrX 3BJkGZRyNesBUod4P5s3q340PBkt87HwXYNTg3D63PNO6eX39AyeYoBKItE03puh53x9 pB5g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=QjSsa+3VLqxdr/IzmR0vMJdoz0AiD1dBs4HC8t27aKU=; b=Xnq4fQFAMwdOhDdzgMJMPtJJYnXhkpsfyLj/vExsnhi3ZiuEdE6iRAbN9EEBpMGweT 7IPCAhleswu79joB3PTWpjZdVIa/tt8nmffJVbLjlvjkE9uhOlpPG5M+y8uoXG1EsdYp u4Xrtgx4lFeNqMytk9OCQQO0EGlhS8UchAETJJS2ClrhWG2ZUvJQN2JIrYUe0Kf75kXu TVOqgDzLAr2u1trEzkb0luF6tr/x/49ZQi9YHotzpUk57C6TzB5KkBd+Rz9sJ4kjTu2E 26vgauMJIX+J7SLlzcxzqTvqqbW2Q6Ws8Dz2/Txai/5fCgSbpXBY2ox3GkzEcTuf5zxi X6Hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=QjSsa+3VLqxdr/IzmR0vMJdoz0AiD1dBs4HC8t27aKU=; b=6rwTHLCZVXCsewKvHmm5L9e2Wfq9h9U2RcNXKeU3XnBCZ8eOpPG7ff3LIKUV2nprp8 QdWIND26k9AMRUIC0/9IQqvBGgsGYwP9sViUpbInFWYSB1JykwJeMmPFoI2mNPqbtByh 8vlt2scSRr0TBBd7xxEMpgRg/rI1H0MS2rhb4HmUxssSZuk21GMMREtyN8zNdJOeybZu P2w5ViYY54FEojRM/rNaBr5XrefsJXzZAs60KV68u0A07URtXVmg9wf4JB0Oxws746eT ucnra/Ki3S0GrlMdCGQ6LBmPhZWh1hDwtv0gKz6XrhFkZzrTkMh7cdWhPFRKsk+MDviv 9d4w== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AJIora8mbww2ADCiPPnDskxvXhlYMKU+qyOx7T5nA7pdVX2723mkXm8P 5aKg1DgUDhVJ65h+kKYyHuU= X-Google-Smtp-Source: AGRyM1sNymQGCJBCxDyytnxEmJglf6egMfsJVzdm0H7iZow7ICKKXDqFuvyxRwR9qFSINOxR/iiDLA== X-Received: by 2002:a25:3c9:0:b0:66e:c58a:7c99 with SMTP id 192-20020a2503c9000000b0066ec58a7c99mr23407624ybd.256.1657636367004; Tue, 12 Jul 2022 07:32:47 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6902:1001:b0:64a:f1d8:396c with SMTP id w1-20020a056902100100b0064af1d8396cls44254763ybt.1.gmail; Tue, 12 Jul 2022 07:32:44 -0700 (PDT) X-Received: by 2002:a25:c405:0:b0:668:c24e:3065 with SMTP id u5-20020a25c405000000b00668c24e3065mr22368384ybf.189.1657636364081; Tue, 12 Jul 2022 07:32:44 -0700 (PDT) In-Reply-To: <77066946-d07a-489a-9ec2-99796422f682n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: ismail.jattioui1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:30951 Archived-At: ------=_Part_1844_1987978746.1657636363475 Content-Type: multipart/alternative; boundary="----=_Part_1845_1965781428.1657636363475" ------=_Part_1845_1965781428.1657636363475 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I tried this code which looked like what I want to do, but it still doesn= =E2=80=99t=20 work unfortunately. There are apparently no RawBlock in the html I posted and I don't see how= =20 we can add one=20 I tried using Para and Block with no success :/ I got the following error : PandocLuaError "Trying to set unavailable property text." at the line=20 indicated by ----> The command I am using: pandoc --metadata toc-title=3Dcustom-toc --lua-filter=3Dfilter.lua=20 input-test.html -o res.docx The luaFilter I am trying: ------------------------------------------------------ local RAW_TOC =3D [[ TOC \o "1-3" \h \z \u ]] local meta_key =3D "toc-title" local vars =3D {} local function getVars (meta) for k, v in pairs(meta) do if v.t =3D=3D 'MetaInlines' then print('isMetaInlines') vars["$" .. k .. "$"] =3D { table.unpack(v) } end end end local function pageBreak(el) if el.text =3D=3D "pandoc-page-break" then print('pageBreak') return pandoc.Str "" else return el end end local function toc(el) print(el) if pandoc.utils.stringify(el) =3D=3D "pandoc-toc" then ----> el.text =3D RAW_TOC el.format =3D "openxml" local para =3D pandoc.Para(vars) local div =3D pandoc.Div({ para, el }) div["attr"]["attributes"]["custom-style"] =3D "TOC Heading" return div end end return { { Meta =3D getVars }, { Str =3D pageBreak }, { RawBlock =3D toc } } ------------------------------------------------------ Le lundi 11 juillet 2022 =C3=A0 10:48:41 UTC+2, Ismail Jattioui a =C3=A9cri= t : > Hi, > > I am trying to convert a html file to docx using pandoc. My problem is=20 > that I can=E2=80=99t manage to move the table of contents to a specific p= osition in=20 > the document. I tried splitting my document into two, then merging it aga= in=20 > but it isn=E2=80=99t optimal since we are using it in production and it c= osts us 2=20 > calls to pandoc and it isn't very maintanable > > I was wondering if there is a way to do that using Lua filters > > In a nutshell, let=E2=80=99s say I have the following html document that = I wish to=20 > convert to DOCX : > > > > > > >

Title 1

>

Some stuff 2

>

Subtitle 1

>

Some stuff 2

>
Other things
>
Insert TOC below
> > > How do I manage to generate a Table of content below the div with the TOC= =20 > id, without splitting the document ? > > Thanks in advance > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/88926968-1ca3-40c4-944f-c78e0554ba84n%40googlegroups.com. ------=_Part_1845_1965781428.1657636363475 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I tried this cod= e which looked like what I want to do, but it still doesn=E2=80=99t work un= fortunately.

There are appa= rently no RawBlock in the html I posted and I don't see how we can add one =

:/ I got the following error :
PandocLuaError "Trying to set unavailable property text." at = the line indicated by ---->

The command I am using:

pandoc --metadata toc-title= =3Dcustom-toc --lua-filter=3Dfilter.lua input-test.html -o res.docx<= /span>

The luaFilter I am trying:

---------= ---------------------------------------------
local RAW_TOC =3D [[
<w:sdt><w:sdtContent xmlns:w=3D"http://schemas.openxmlformats.org/wordprocessi= ngml/2006/main">
<w:p>
<w:r>
<w:fldChar w:fldCha= rType=3D"begin" w:dirty=3D"true" />
<w:instrText xml:space=3D"pres= erve">TOC \o "1-3" \h \z \u</w:instrText>
<w:fldChar w:fldCh= arType=3D"separate" />
<w:fldChar w:fldCharType=3D"end" />
&= lt;/w:r>
</w:p>
</w:sdtContent>
</w:sdt>
]= ]
local meta_key =3D "toc-title"
local vars =3D {}


local f= unction getVars (meta)
   for k, v in pairs(meta) do
 =     if v.t =3D=3D 'MetaInlines' then
      &nb= sp;  print('isMetaInlines')
         vars[= "$" .. k .. "$"] =3D { table.unpack(v) }
      end
&nb= sp;  end
end

local function pageBreak(el)
   if= el.text =3D=3D "pandoc-page-break" then
      print('pag= eBreak')
      return pandoc.Str ""
   else<= br>      return el
   end
end


loc= al function toc(el)
   print(el)
   if pandoc.uti= ls.stringify(el) =3D=3D  "pandoc-toc" then
      ---= -> el.text =3D RAW_TOC
      el.format =3D "openxml"      local para =3D pandoc.Para(vars)
    &n= bsp; local div =3D pandoc.Div({ para, el })
      div["at= tr"]["attributes"]["custom-style"] =3D "TOC Heading"
     = ; return div
   end
end

return {
   { M= eta =3D getVars },
   { Str =3D pageBreak },
   {= RawBlock =3D toc }
}
-----------------------------------------------= -------
Le lundi 11 juillet 2022 =C3=A0 10:48:41 UTC+2, Ismail Jattioui a = =C3=A9crit=C2=A0:
Hi,

I am trying to convert a html file to docx using pandoc. My problem is=20 that I can=E2=80=99t manage to move the table of contents to a specific pos= ition in the document. I tried splitting my document into two, then merging=20 it again but it isn=E2=80=99t optimal since we are using it in production a= nd it costs us 2 calls to pandoc and it isn't very maintanable

I was wondering if there is a way to do that u= sing Lua filters

In = a nutshell, let=E2=80=99s say I have the following html document that I wis= h to convert to DOCX :

<!DOCTYPE html>=
<html lang=3D"en">
=C2=A0 =C2=A0 <= head>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <meta charset=3D"UTF-8"= ; />
=C2=A0 =C2=A0 </head>
=C2=A0 =C2=A0 <h1>Title 1&l= t;/h1>
=C2=A0 =C2=A0 <p>Some stuff 2</p>
=C2=A0 =C2=A0= <h2>Subtitle 1</h2>
=C2=A0 =C2=A0 <p>Some stuff 2<= /p>
=C2=A0 =C2=A0 <div>Other things</div>
=C2=A0 =C2= =A0 <div id=3D"TOC">Insert TOC below</div>
</ht= ml>

How do I manage to generate a Table of content below the div with the TOC = id, without splitting the document ?

Thanks in advance

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/88926968-1ca3-40c4-944f-c78e0554ba84n%40googlegroups.= com.
------=_Part_1845_1965781428.1657636363475-- ------=_Part_1844_1987978746.1657636363475--