From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32201 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ben Menashe Newsgroups: gmane.text.pandoc Subject: Re: docx -> gfm with custom styles Date: Sat, 18 Feb 2023 11:26:01 -0800 (PST) Message-ID: References: <3909f520-e8db-4cf9-900d-6a5a858c1a18n@googlegroups.com> <52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1148_1727493507.1676748361865" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2020"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDNMLY433MCBBS6MYSPQMGQEVL63G2Q-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Feb 18 20:26:07 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qt1-f189.google.com ([209.85.160.189]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pTSqV-0000Jz-MY for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 18 Feb 2023 20:26:07 +0100 Original-Received: by mail-qt1-f189.google.com with SMTP id z6-20020ac875c6000000b003b9bd2a2284sf493890qtq.4 for ; Sat, 18 Feb 2023 11:26:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; t=1676748366; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=W8N0zS4mnLMwfznuGozRx2n3tPvJ47n+Q9locF0WzWc=; b=BUTXdonCN5JEHQvPM6+fR2tVoBCCXwrfLnSsXrNLkD6641hDc0AHiRB1CvkphE5yiH y1wEV2dEMwejr0ek1TNwpfI8eUbVJbL6Fk/2i5r8Ayta/MX7kqkPg2gqoIfeOKLiR6/y Bqag0D1Xf4W079LdO+3IOh/eQoNsxiCO6wBfPYQaD99wI9dEb5TcR/PlvxMeO+HEHUZg +2qb0adTLKJi7mrjkhyWKcxxPZCqhAz/Ixav+LDw0L7+oj5eSV/xvcqtYgvzoiuJcYk4 Bz3kwYemhOS2YP9VFPZluWF99NJrkZR8az+MYILUmh+gChJKxG9OCtpNqb/ktPwj7pHx hu0w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1676748366; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=W8N0zS4mnLMwfznuGozRx2n3tPvJ47n+Q9locF0WzWc=; b=M49zq8hLVT21sNuh9YnVKm++GaMBd+Pf9MBc+lUK1vdI++ltI0WZNpNM28KiJGsSb4 WHUCLnnZFiKSrylqxL1wO3fHMAn7ool2SuPJ8G6b2Ij6gcpRYPdElUMDpiWzUmeb7S1u OBXcy7GHPSyDiwgPTaP0GjiIOxuFbqIAwWxJq+r3tODTBPMNkTgW8+npD5fGkiqcPzTe 533EfSbqaTLtYWVok8KKfrtZGkRqshdmDV8tAdIUDUYmrDaNvvyKJiQJ3X232SXXR9Za qO58Fwg6yxI+rzpC2+QfUGQzEtTSIcRpDBudUpTmQvZleE8BGk48PttYv0xjiK4PJQ91 vFmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1676748366; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:sender:from:to:cc :subject:date:message-id:reply-to; bh=W8N0zS4mnLMwfznuGozRx2n3tPvJ47n+Q9locF0WzWc=; b=F60EnC3Yjh7K1xq7012DyRxkPeYrqGLRgu6C82XthogRtdOSUKVaPUkYQRoDOBLA60 Y+idLjn4QiSxqSHhiNUilQeQcmhalXjZGrvztub9oOyg77nVe/X7XwPoozFi1YKJrWNb zwT9yqitrtdg1NUa7Zw9y18OHQTKBgc5nyxRmtdpk/SZ1zsri5LD4I5ByHUHoLq48kjB kHAp2NVIpt7RoqAjS2WoN4ZAFCSkR6Xbs1QVkRx0z9M7It2qbKDqfoF3egX3hSxpDxDx eRq+ibRp2eKp0zxza/zPdjxizWV/9wQ6THSfX24rZvb4kCb1BrIMrzcgC84klxKAfxP8 Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AO0yUKWojy14P2MvU/mD9vrvo0I3vQtJWPLvS8gd3OpwAEhFzCS661Ia IU89n5tRVGp1F4JWmOWJ5Gk= X-Google-Smtp-Source: AK7set8lmEfFW2lRtGFDLPf0nwoQ/+YuFLKYDHiHFiASflYEo5pM5ovBwgsb/1d421z1lXY3nUl7ug== X-Received: by 2002:a0c:e1d0:0:b0:56e:bb55:df1d with SMTP id v16-20020a0ce1d0000000b0056ebb55df1dmr132320qvl.20.1676748366552; Sat, 18 Feb 2023 11:26:06 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:ad4:4f8f:0:b0:56e:8a76:960e with SMTP id em15-20020ad44f8f000000b0056e8a76960els1285829qvb.9.-pod-prod-gmail; Sat, 18 Feb 2023 11:26:02 -0800 (PST) X-Received: by 2002:a0c:f353:0:b0:56f:80e:8ae1 with SMTP id e19-20020a0cf353000000b0056f080e8ae1mr211012qvm.14.1676748362549; Sat, 18 Feb 2023 11:26:02 -0800 (PST) In-Reply-To: X-Original-Sender: benm5678-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32201 Archived-At: ------=_Part_1148_1727493507.1676748361865 Content-Type: multipart/alternative; boundary="----=_Part_1149_1707114854.1676748361865" ------=_Part_1149_1707114854.1676748361865 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hmm, yea, I see -- it's part of large original docx, so it was failing on= =20 another element styled as Example & had italics applied. I printed the div in Lua filter, when it works I see this: ``` Div ("",[],[("custom-style","Example")]) [Para [Str "Test",Space,Str=20 "example"]] ``` and when fails this: ``` Div ("",[],[("custom-style","Example")]) [BlockQuote [Para [Emph [Str=20 "Example:"]]]] ``` is there any clean way to approach this so it will work in a generic way=20 and preserve any other formatting applied? On Saturday, February 18, 2023 at 1:19:46 AM UTC-7 Bastien DUMONT wrote: > With your examples, I get: > > ## Scope > >
> > Test body > >
> > ## Test nested > > Le Friday 17 February 2023 =C3=A0 07:00:47AM, Ben Menashe a =C3=A9crit : > > Thank you so much...that worked - I was missing the [1].content. > > But let's say I have another 'Example' custom style under it... w/o Lua= =20 > filter > > it renders this structure: > >=20 > > ``` > >
> >=20 > > Scope > >=20 > >
> >=20 > >
> >=20 > > Test body > >=20 > >
> >=20 > >
> >=20 > > Test nested > >=20 > >
> > ``` > >=20 > > And with filter below it fails on line 8 w/ this error "Inline, list of > > Inlines, or string expected, got Blocks"... any idea on how to=20 > troubleshoot > > such issues?: > >=20 > > ``` > > return { > > { > > Div =3D function (div) > > if (div.attributes['custom-style'] =3D=3D 'Internal Heading') then > > return pandoc.Header(2, div.content[1].content) > > end > > if (div.attributes['custom-style'] =3D=3D 'Example') then > > return pandoc.Header(2, div.content[1].content) > > end > >=20 > > return div > > end, > > } > > } > > ``` > > On Friday, February 17, 2023 at 1:10:11 AM UTC-7 Bastien DUMONT wrote: > >=20 > > In this case, it would be preferable to turn the div into a Header=20 > element > > and let Pandoc format it itself: > >=20 > > ``` > > function Div(div) > > if div.attributes['custom-style'] =3D=3D 'Internal Heading' then > > return pandoc.Header(2, div.content[1].content) > > end > > end > > ``` > >=20 > > Le Thursday 16 February 2023 =C3=A0 08:00:08PM, Ben Menashe a =C3=A9cri= t : > > > Hi, > > > We have a need to convert docx to gfm. > > > Since docx has some user defined styles we use this "+styles"=20 > extension: > > > > > > > > > pandoc --to=3Dgfm -f docx+styles --output=3Drtb.md --extract-media=3D= .=20 > --wrap=3D > > none > > > 'rtb.docx' > > > > > > > > > So now we have html div that wraps our content. Let's say I want to > > transform > > > this: > > > > > >
> > > > > > Scope > > > > > >
> > > > > > Into: > > > > > > ## Scope > > > > > > How can it be done? I tried to setup a Lua filter but not having=20 > success > > to > > > have it output "##" along with the div content. > > > > > > > > > -- > > > You received this message because you are subscribed to the Google=20 > Groups > > > "pandoc-discuss" group. > > > To unsubscribe from this group and stop receiving emails from it, sen= d=20 > an > > email > > > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > > To view this discussion on the web visit [2][1] > https://groups.google.com/ > > d/msgid/ > > > pandoc-discuss/3909f520-e8db-4cf9-900d-6a5a858c1a18n%[2] > > 40googlegroups.com. > > > > > > References: > > > > > > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > > > [2] [3]https://groups.google.com/d/msgid/pandoc-discuss/ > > 3909f520-e8db-4cf9-900d-6a5a858c1a18n% > 40googlegroups.com?utm_medium=3Demail& > > utm_source=3Dfooter > >=20 > >=20 > > -- > > You received this message because you are subscribed to the Google Grou= ps > > "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email > > to [4]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit [5] > https://groups.google.com/d/msgid/ > > pandoc-discuss/52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%40googlegroups.com= . > >=20 > > References: > >=20 > > [1] https://groups.google.com/d/msgid/ > > [2] http://40googlegroups.com/ > > [3]=20 > https://groups.google.com/d/msgid/pandoc-discuss/3909f520-e8db-4cf9-900d-= 6a5a858c1a18n%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter > > [4] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > > [5]=20 > https://groups.google.com/d/msgid/pandoc-discuss/52ada5c3-e26e-4c8c-8b3f-= b55bb8ce8e1en%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/ef5a0088-1df4-4540-98d5-a0120df8f3cen%40googlegroups.com. ------=_Part_1149_1707114854.1676748361865 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hmm, yea, I see -- it's part of large original docx, so it was failing on a= nother element styled as Example & had italics applied.
I printed t= he div in Lua filter, when it works I see this:

= ```

Div ("",[],[("custom-style","Example")]) [Para [Str "Test"= ,Space,Str "example"]]

```

and whe= n fails this:
```

Div ("",[],[("custom-style","Examp= le")]) [BlockQuote [Para [Emph [Str "Example:"]]]]

```


is there any clean way to approach this so it will work in= a generic way and preserve any other formatting applied?
On Saturday, February 18, 2023 at 1:19:46 AM UTC-7 Bastien DUM= ONT wrote:
Wi= th your examples, I get:

## Scope

<div custom-style=3D"Body Text">

Test body

</div>

## Test nested

Le Friday 17 February 2023 =C3=A0 07:00:47AM, Ben Menashe a =C3=A9crit = :
> Thank you so much...that worked - I was missing the [1].content.
> But let's say I have another 'Example' custom style un= der it... w/o Lua filter
> it renders this structure:
>=20
> ```
> <div custom-style=3D"Internal Heading">
>=20
> Scope
>=20
> </div>
>=20
> <div custom-style=3D"Body Text">
>=20
> Test body
>=20
> </div>
>=20
> <div custom-style=3D"Example">
>=20
> Test nested
>=20
> </div>
> ```
>=20
> And with filter below it fails on line 8 w/ this error "Inlin= e, list of
> Inlines, or string expected, got Blocks"... any idea on how t= o troubleshoot
> such issues?:
>=20
> ```
> return {
> {
> Div =3D function (div)
> if (div.attributes['custom-style'] =3D=3D 'Internal He= ading') then
> return pandoc.Header(2, div.content[1].content)
> end
> if (div.attributes['custom-style'] =3D=3D 'Example'= ;) then
> return pandoc.Header(2, div.content[1].content)
> end
>=20
> return div
> end,
> }
> }
> ```
> On Friday, February 17, 2023 at 1:10:11 AM UTC-7 Bastien DUMONT wr= ote:
>=20
> In this case, it would be preferable to turn the div into a He= ader element
> and let Pandoc format it itself:
>=20
> ```
> function Div(div)
> if div.attributes['custom-style'] =3D=3D 'Internal= Heading' then
> return pandoc.Header(2, div.content[1].content)
> end
> end
> ```
>=20
> Le Thursday 16 February 2023 =C3=A0 08:00:08PM, Ben Menashe a = =C3=A9crit :
> > Hi,
> > We have a need to convert docx to gfm.
> > Since docx has some user defined styles we use this "= ;+styles" extension:
> >
> >
> > pandoc --to=3Dgfm -f docx+styles --output=3Drtb.md --extr= act-media=3D. --wrap=3D
> none
> > 'rtb.docx'
> >
> >
> > So now we have html div that wraps our content. =C2=A0Let= 's say I want to
> transform
> > this:
> >
> > <div custom-style=3D"Internal Heading">
> >
> > Scope
> >
> > </div>
> >
> > Into:
> >
> > ## Scope
> >
> > How can it be done? I tried to setup a Lua filter but not= having success
> to
> > have it output "##" along with the div content.
> >
> >
> > --
> > You received this message because you are subscribed to t= he Google Groups
> > "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails = from it, send an
> email
> > to [1]pandoc-d= iscus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> > To view this discussion on the web visit [2][1]https://groups.google.com/
> d/msgid/
> > pandoc-discuss/3909f520-e8db-4cf9-900d-6a5a858c1a18n%[2]
> 40googlegroups.com.
> >
> > References:
> >
> > [1] mailto:pan= doc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > [2] [3]https://groups.google.com/d/msgid/pandoc-discuss/<= /a>
> 3909f520-e8db-4cf9-900d-6a5a858c1a18n%
40googlegroups.com?utm_med= ium=3Demail&
> utm_source=3Dfooter
>=20
>=20
> --
> You received this message because you are subscribed to the Google= Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email
> to [4]pandoc-discus...@= googlegroups.com.
> To view this discussion on the web visit [5]https://groups.google.com/d/msgid/
> pandoc-discuss/52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%40googlegroups.com.
>=20
> References:
>=20
> [1] https://groups.= google.com/d/msgid/
> [2] http://40googlegroups.com/
> [3] https://groups= .google.com/d/msgid/pandoc-discuss/3909f520-e8db-4cf9-900d-6a5a858c1a18n%40= googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter
> [4] mailto:pandoc-discu= s...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [5] https://groups= .google.com/d/msgid/pandoc-discuss/52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%40= googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/ef5a0088-1df4-4540-98d5-a0120df8f3cen%40googlegroups.= com.
------=_Part_1149_1707114854.1676748361865-- ------=_Part_1148_1727493507.1676748361865--