From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31002 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ismail Jattioui Newsgroups: gmane.text.pandoc Subject: Re: Move TOC when converting html to docx Date: Sun, 17 Jul 2022 23:33:09 -0700 (PDT) Message-ID: References: <77066946-d07a-489a-9ec2-99796422f682n@googlegroups.com> <88926968-1ca3-40c4-944f-c78e0554ba84n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_2512_1970695306.1658125989761" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="647"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDH3TLP3SIARBJX52OLAMGQEMH3RYAI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Jul 18 08:33:14 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-yw1-f188.google.com ([209.85.128.188]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oDKJe-000AcM-FP for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 18 Jul 2022 08:33:14 +0200 Original-Received: by mail-yw1-f188.google.com with SMTP id 00721157ae682-31dfe25bd47sf53619507b3.18 for ; Sun, 17 Jul 2022 23:33:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=B6yS1qPyxhqDoE/B8lflM1gEzbp+3y8myQx4Y0arITg=; b=E4UZYcXgKHRtD72nuEdBbzGrXTgtMy1VXKEEh5qhAlxLwDLhz0Dj8PsnINIpfzmKg9 wVwaX+73B2dnuX6NtEJjzCB/sMiIRwDcytHTW4vGrrkjbtgtJTZ5qx+hMdmQOTKREwg1 RkWw0gCkf5QnXwbJS8efs1FMK0rXeDSZID1qBeKkxq0q6jvnFKrKpmGvxDoDtcSqmWNo TO1ZhCOjTTvvp6SEO/VKJPT0e3ck9WgWyrR8HqNo+VZgZPsdDL9tEANGCvDgsZfNCgNI IyG+ZQYKW2QWErKinwFiJCcWt4W+vfuvIjr9b0/b8eRUiQ23cceh6c7scvg7uVOV4CmT 1NTg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=B6yS1qPyxhqDoE/B8lflM1gEzbp+3y8myQx4Y0arITg=; b=oDgvL9QQPmxwNrI4sacmo2kcTCXUnrtu/cNnyq8ERuBYeD/FsouvEYLZwxujBMelsF V8Q4Owy5td2BDhC0q+vnSYB6d/9Mt2MFiIoQEI46/VQrE50utv1jkPAHV4eJa3rDRfCU p9UsykdB513Ea52v3PTUg+k8orohZV96AiRydBfi6NgOHaaTQFAKgJ5ITFOJyNCs/9Iw Z+jD6Tz6wpbEx4XpdFa5qklk/G0BqhruZ1a5yjXTCi0tYi4ntQwzSjVqrZ8F9bBfmLHz OlDmWo138pYHomcyvVaKePtLxR0P8inS4SdCYnXsH42XCZnnPYouKeMhWgCinvxhsVXI yusg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=B6yS1qPyxhqDoE/B8lflM1gEzbp+3y8myQx4Y0arITg=; b=BAUHeWMTAq52Xt/w/tf8C1KIlgI/C24KyidLlskJZZ5MmC8dTkhLcS1iDYTP0v5be5 juMt8K6K7UcPDoPYYh78bITSxGVjtToAwZfp5hyAdS4RI4syo6E81/j4EiclWmA3TA2O Mbs/YjKAoJmS9djMhsJ1+rlv0S3r4RuGxZUKtAJ8fuvru8Ax7VESxd4p8osG4IPKHRPu 4mFSnLhE1PxwzXcXPqR2y1J4Z8/kw1fqLwwTdHm/m5/APU0O8f5K3QmAoYJQJG+uTmyi kT0NCTwvRGaESMxLaVMzchY9LkAv8rqIx5NYZ281ZuX+oiFN2t5hTLjBfRht+hd84+1E WLcA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AJIora+SZrtWvQT9nYnWGsFyhLm5jhs6CUrFZu11zd6fFLJJrFrrwWJu gqPq8M3Gl12sHikGIN+gZzA= X-Google-Smtp-Source: AGRyM1uksyiWS7ptzXEksDAegd4VC/fm+1HEWeHQOlCJ6Ak1M/JHfaeukdu2dIcEu/7HQEOpnZGWYw== X-Received: by 2002:a5b:18e:0:b0:66e:ca1c:bab0 with SMTP id r14-20020a5b018e000000b0066eca1cbab0mr26145439ybl.298.1658125993412; Sun, 17 Jul 2022 23:33:13 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a81:990e:0:b0:313:306d:ddc7 with SMTP id q14-20020a81990e000000b00313306dddc7ls19280ywg.3.-pod-prod-gmail; Sun, 17 Jul 2022 23:33:10 -0700 (PDT) X-Received: by 2002:a81:6d01:0:b0:31d:a153:ace0 with SMTP id i1-20020a816d01000000b0031da153ace0mr28517202ywc.260.1658125990392; Sun, 17 Jul 2022 23:33:10 -0700 (PDT) In-Reply-To: <88926968-1ca3-40c4-944f-c78e0554ba84n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: ismail.jattioui1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31002 Archived-At: ------=_Part_2512_1970695306.1658125989761 Content-Type: multipart/alternative; boundary="----=_Part_2513_564931164.1658125989761" ------=_Part_2513_564931164.1658125989761 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable up please=20 Le mardi 12 juillet 2022 =C3=A0 16:32:43 UTC+2, Ismail Jattioui a =C3=A9cri= t : > I tried this code which looked like what I want to do, but it still=20 > doesn=E2=80=99t work unfortunately. > > There are apparently no RawBlock in the html I posted and I don't see how= =20 > we can add one=20 > > I tried using Para and Block with no success :/ I got the following error= =20 > : > PandocLuaError "Trying to set unavailable property text." at the line=20 > indicated by ----> > > The command I am using: > > pandoc --metadata toc-title=3Dcustom-toc --lua-filter=3Dfilter.lua=20 > input-test.html -o res.docx > > The luaFilter I am trying: > > ------------------------------------------------------ > local RAW_TOC =3D [[ > > http://schemas.openxmlformats.org/wordprocessingml/2006/main"> > > > > TOC \o "1-3" \h \z \u > > > > > > > ]] > local meta_key =3D "toc-title" > local vars =3D {} > > > local function getVars (meta) > for k, v in pairs(meta) do > if v.t =3D=3D 'MetaInlines' then > print('isMetaInlines') > vars["$" .. k .. "$"] =3D { table.unpack(v) } > end > end > end > > local function pageBreak(el) > if el.text =3D=3D "pandoc-page-break" then > print('pageBreak') > return pandoc.Str "" > else > return el > end > end > > > local function toc(el) > print(el) > if pandoc.utils.stringify(el) =3D=3D "pandoc-toc" then > ----> el.text =3D RAW_TOC > el.format =3D "openxml" > local para =3D pandoc.Para(vars) > local div =3D pandoc.Div({ para, el }) > div["attr"]["attributes"]["custom-style"] =3D "TOC Heading" > return div > end > end > > return { > { Meta =3D getVars }, > { Str =3D pageBreak }, > { RawBlock =3D toc } > } > ------------------------------------------------------ > Le lundi 11 juillet 2022 =C3=A0 10:48:41 UTC+2, Ismail Jattioui a =C3=A9c= rit : > >> Hi, >> >> I am trying to convert a html file to docx using pandoc. My problem is= =20 >> that I can=E2=80=99t manage to move the table of contents to a specific = position in=20 >> the document. I tried splitting my document into two, then merging it ag= ain=20 >> but it isn=E2=80=99t optimal since we are using it in production and it = costs us 2=20 >> calls to pandoc and it isn't very maintanable >> >> I was wondering if there is a way to do that using Lua filters >> >> In a nutshell, let=E2=80=99s say I have the following html document that= I wish=20 >> to convert to DOCX : >> >> >> >> >> >> >>

Title 1

>>

Some stuff 2

>>

Subtitle 1

>>

Some stuff 2

>>
Other things
>>
Insert TOC below
>> >> >> How do I manage to generate a Table of content below the div with the TO= C=20 >> id, without splitting the document ? >> >> Thanks in advance >> > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/a9967f45-314e-484c-a642-ecb03c315e10n%40googlegroups.com. ------=_Part_2513_564931164.1658125989761 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable up please

Le mardi 12 juillet 2022 =C3=A0 16:32:43 UTC+2, Ismail Jattioui a = =C3=A9crit=C2=A0:
I tried this co= de which looked like what I want to do, but it still doesn=E2=80=99t work u= nfortunately.
<= span style=3D"font-weight:400;font-style:normal">
There are apparently no RawBl= ock in the html I posted and I don't see how we can add one
=

I tried using Par= a and Block with no success=C2=A0:/ I got= the following error :
PandocLuaError "Trying= to set unavailable property text." at the line indicated by ----><= /span>

The command I am using:
=

pandoc --metadat= a toc-title=3Dcustom-toc --lua-filter=3Dfilter.lua input-test.html -o res.d= ocx

The luaFilter I am trying:
------------------= ------------------------------------
local RAW_TOC =3D [[
<w:sdt>
<w:sdtCon= tent xmlns:w=3D"http://schemas.openxmlformats.o= rg/wordprocessingml/2006/main">
<w:p>
<w:r><w:fldChar w:fldCharType=3D"begin" w:dirty=3D"true"= ; />
<w:instrText xml:space=3D"preserve">TOC \o "= ;1-3" \h \z \u</w:instrText>
<w:fldChar w:fldCharType=3D&q= uot;separate" />
<w:fldChar w:fldCharType=3D"end" /= >
</w:r>
</w:p>
</w:sdtContent>
</w:sdt= >
]]
local meta_key =3D "toc-title"
local vars =3D {}=


local function getVars (meta)
=C2=A0 =C2=A0for k, v in pairs= (meta) do
=C2=A0 =C2=A0 =C2=A0 if v.t =3D=3D 'MetaInlines' then<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0print('isMetaInlines')
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vars["$" .. k .. "$"] = =3D { table.unpack(v) }
=C2=A0 =C2=A0 =C2=A0 end
=C2=A0 =C2=A0end
= end

local function pageBreak(el)
=C2=A0 =C2=A0if el.text =3D=3D &= quot;pandoc-page-break" then
=C2=A0 =C2=A0 =C2=A0 print('pageBr= eak')
=C2=A0 =C2=A0 =C2=A0 return pandoc.Str ""
=C2=A0 = =C2=A0else
=C2=A0 =C2=A0 =C2=A0 return el
=C2=A0 =C2=A0end
end
=

local function toc(el)
=C2=A0 =C2=A0print(el)
=C2=A0 =C2=A0if= pandoc.utils.stringify(el) =3D=3D =C2=A0"pandoc-toc" then
=C2= =A0 =C2=A0 =C2=A0 ----> el.text =3D RAW_TOC
=C2=A0 =C2=A0 =C2=A0 el.f= ormat =3D "openxml"
=C2=A0 =C2=A0 =C2=A0 local para =3D pandoc= .Para(vars)
=C2=A0 =C2=A0 =C2=A0 local div =3D pandoc.Div({ para, el })<= br>=C2=A0 =C2=A0 =C2=A0 div["attr"]["attributes"]["= ;custom-style"] =3D "TOC Heading"
=C2=A0 =C2=A0 =C2=A0 re= turn div
=C2=A0 =C2=A0end
end

return {
=C2=A0 =C2=A0{ Meta = =3D getVars },
=C2=A0 =C2=A0{ Str =3D pageBreak },
=C2=A0 =C2=A0{ Raw= Block =3D toc }
}
---------------------------------------------------= ---
Le lundi 11 juillet 2022 =C3=A0 10:48:41 UTC+2, Ismail Jattioui a =C3= =A9crit=C2=A0:
Hi,
I am trying to convert a html file to docx using pandoc. My problem is=20 that I can=E2=80=99t manage to move the table of contents to a specific pos= ition in the document. I tried splitting my document into two, then merging=20 it again but it isn=E2=80=99t optimal since we are using it in production a= nd it costs us 2 calls to pandoc and it isn't very maintanable

I was wondering if there is a way to do that u= sing Lua filters

In = a nutshell, let=E2=80=99s say I have the following html document that I wis= h to convert to DOCX :

<!DOCTYPE html>=
<html lang=3D"en">
=C2=A0 =C2=A0 <= head>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <meta charset=3D"UTF-8"= ; />
=C2=A0 =C2=A0 </head>
=C2=A0 =C2=A0 <h1>Title 1&l= t;/h1>
=C2=A0 =C2=A0 <p>Some stuff 2</p>
=C2=A0 =C2=A0= <h2>Subtitle 1</h2>
=C2=A0 =C2=A0 <p>Some stuff 2<= /p>
=C2=A0 =C2=A0 <div>Other things</div>
=C2=A0 =C2= =A0 <div id=3D"TOC">Insert TOC below</div>
</ht= ml>

How do I manage to generate a Table of content below the div with the TOC = id, without splitting the document ?

Thanks in advance

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/a9967f45-314e-484c-a642-ecb03c315e10n%40googlegroups.= com.
------=_Part_2513_564931164.1658125989761-- ------=_Part_2512_1970695306.1658125989761--