From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32882 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Sigismond Newsgroups: gmane.text.pandoc Subject: Re: docx+styles to dokuwiki somehow ? Date: Tue, 27 Jun 2023 02:35:06 -0700 (PDT) Message-ID: References: <16df0de5-a608-4e6e-9545-3fa338229d8fn@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1766_1648830203.1687858506477" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="39639"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCJ4VS5M3INRBS625KSAMGQELJW32ZY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Jun 27 11:35:11 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oo1-f63.google.com ([209.85.161.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1qE56N-000A6Q-A4 for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 27 Jun 2023 11:35:11 +0200 Original-Received: by mail-oo1-f63.google.com with SMTP id 006d021491bc7-558c1394636sf3345787eaf.0 for ; Tue, 27 Jun 2023 02:35:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1687858510; x=1690450510; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=PqqE19zIv+VOCUe26jXyD44o4xdUKBaQlz4YKIRjMtU=; b=MqLiswLDNLgxYfeKf846zc7dop/6WoBJ54qMnxYlPSK4rYnGzpLCECIadC8+nLEXMw 3V6YS7GWqTu3qYVbWHDWp0m+di5L/JmAyDBtA88ie4QVL1rZZA/UiWHRiMEFPWBLbLff Af+zFXP+kOSwkmA3hClST+LOH/7vWe1jsTj27xuCFZEU8EE4y+t9vbI2uO5EDUDlcjG+ zhWeq0m2H6YDF4KA0zRwX0DKYjkATi/mQhQVC0iJaBbOZqsQ26tlKaoHQKGpuVhRHpRh 135r8OAjqQfbTnu6WVCxMHM8O4EVc6+YO3YUlx6oJmIw7bYtiUprDja9Kd+q071nFy23 JhKA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687858510; x=1690450510; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=PqqE19zIv+VOCUe26jXyD44o4xdUKBaQlz4YKIRjMtU=; b=RynO0whaRUWE79eJ4IJktr4JAevyspsiRHf/rnS0EB8qmMvodjaVnxXVipj1RllX9M o/V94GJTrgKDSTz6zJbtSKhH4zDY7nTgUPbkrxdM59FQ2Qn1CWXSJzl8EfzIbnPpcSXV UZxxsZJxJqN5d6BNagnUqWZ9OxgQdp3snd/XMIfZAb//htfTGQUXbC8n+MiJI2/Xi2tp dy5opQN9CZGtk8ppyHCfFcZYdrpZhSp03ngNn8h7f86+2oyd5fPec3h3ZlD3wJes+OTE s5zn+vnToXDdkvbtbI2TiHzT5UO0bPVF4pXZDcD3fHtA4VgIq6VWYrV0A7X7OidQIvOf 5SjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687858510; x=1690450510; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-beenthere:x-gm-message-state:sender:from :to:cc:subject:date:message-id:reply-to; bh=PqqE19zIv+VOCUe26jXyD44o4xdUKBaQlz4YKIRjMtU=; b=iwNYrGdrkxl3ZjPe7n80t0uIaXlJ/BntTIVdHDQl0FVcilhXxH6X9BHQ9A1w8fzR4P kYny5j8GBiA18+55QbGts0J6XhE5uCApWyyOGkeYQRHjIjmNPnbrjFRGE6kvjWosC3Lr HHTqhWMZPr+pcUTX/rMVr42sPUJjQClGgX9CVr2D+9yZQKpKMdV0oO00rz5GkhMpZI0r pT0z/LFlqU8PZb2N2vyQ+Z5PYtEo6NWoRWZPhogSE9s7GV+9Nl11LU5H0tBz6NVFrZW1 kmNMxvOFdvDM6iHCNHUDV+Sn+xHnpahll9oPP/c9eaUHI5VDiy Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AC+VfDyVSGZzMMHXLCZn4YMMkfvHgvQRIfXsZmJGiY1WYHU5Ebi/8+6m sNrYsHemJdn1p8UdIIf6efA= X-Google-Smtp-Source: ACHHUZ6WzCpq3svL5EDIdfW/Fs4+KViG4/QxBO2vkh2V7w2+0C7M0xA7yjJlAPo0jQB4Gha8Uslhzw== X-Received: by 2002:a05:6870:c14e:b0:1b0:3edd:cdd5 with SMTP id g14-20020a056870c14e00b001b03eddcdd5mr2605868oad.6.1687858510134; Tue, 27 Jun 2023 02:35:10 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6871:5213:b0:19f:9f28:a57c with SMTP id ht19-20020a056871521300b0019f9f28a57cls119042oac.1.-pod-prod-08-us; Tue, 27 Jun 2023 02:35:07 -0700 (PDT) X-Received: by 2002:a05:6870:3c13:b0:1b0:5c0a:c047 with SMTP id gk19-20020a0568703c1300b001b05c0ac047mr473808oab.2.1687858507240; Tue, 27 Jun 2023 02:35:07 -0700 (PDT) In-Reply-To: X-Original-Sender: pascal.conil.lacoste-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32882 Archived-At: ------=_Part_1766_1648830203.1687858506477 Content-Type: multipart/alternative; boundary="----=_Part_1767_394707929.1687858506477" ------=_Part_1767_394707929.1687858506477 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Well=E2=80=A6 it does work but, somehow, docx+styles messes with the lists = : For a simple docx with just one list, unordered here is what I get with -f= =20 docx+styles -t dokuwiki :
  • Liste 1

  • liste 2

  • liste 3

    • liste 3a

    • liste 3b

    • liste 3c

  • liste 4

Which is not parsed by dokuwiki. Without +styles : * Liste 1 * liste 2 * liste 3 * liste 3a * liste 3b * liste 3c * liste 4 Which is syntactically correct dokuwiki format. If I understand it well, Pandoc seems to consider an ordered list badly=20 formatted only when +styles is applied and it spits out some raw html with= =20

tags inside

  • s So what is it ? Bad implementation in Dokuwiki writer ?=20 How can I benefit from both +styles, with my lua filter, and lists ?=20 -- Pascal Le lundi 26 juin 2023 =C3=A0 16:04:17 UTC+2, Sigismond a =C3=A9crit : > Thanks a lot Bastien, it works perfectly well. > > Le lundi 26 juin 2023 =C3=A0 15:47:00 UTC+2, Bastien DUMONT a =C3=A9crit = : > >> With `-f docx+styles`, you can replace the divs with custom styles with= =20 >> this kind of filter:=20 >> >> ```=20 >> function Div (div)=20 >> local custom_style =3D div.attributes['custom-style']=20 >> if custom_style then=20 >> local pre =3D pandoc.RawBlock('dokuwiki', '> '">')=20 >> local post =3D pandoc.RawBlock('dokuwiki', '')=20 >> local content =3D div.content=20 >> table.insert(content, 1, pre)=20 >> table.insert(content, post)=20 >> return content=20 >> end=20 >> end=20 >> ```=20 >> >> Le Monday 26 June 2023 =C3=A0 06:16:48AM, Sigismond a =C3=A9crit :=20 >> > OK, let's try it another way :=20 >> >=20 >> > I plan to use Pandoc to convert several docx files to dokuwiki format.= =20 >> > I need to retain custom block styles and convert them to custom tags,= =20 >> something=20 >> > like =20 >> >=20 >> > =20 >> > my dokuwiki formatted block text=20 >> > =20 >> >=20 >> > Do I need to develop a custom dokuwiki writer from scratch to do that= =20 >> or is=20 >> > there a way to use lua filters for this purpose.=20 >> > Sorry if the answer is obvious but I struggle to find relevant=20 >> information.=20 >> >=20 >> > Thanks for any help,=20 >> > --=20 >> > Pascal=20 >> >=20 >> >=20 >> > Le mercredi 26 avril 2023 =C3=A0 16:14:20 UTC+2, pascal Conil-lacoste = a=20 >> =C3=A9crit :=20 >> >=20 >> > Hi everybody,=20 >> >=20 >> > I've been using pandoc for some years to accomplish very=20 >> straightforward=20 >> > conversions.=20 >> > Now that what I plan to do is a little more complex, I struggle to fin= d=20 >> > relevant information.=20 >> >=20 >> > I need to convert docx to dokuwiki and retain Word custom styles. I=20 >> thought=20 >> > I could use docx+styles to get custom-styles in dokuwiki files but the= y=20 >> > don't make it to the output and get stripped.=20 >> >=20 >> > I would be happy with ::: {custom-style=3D"myStyle"} my text here:::= =20 >> >=20 >> > If I could get something along these lines, I would be able to apply= =20 >> some=20 >> > other simple transformation to get to the final dokuwiki files and=20 >> treat=20 >> > them with a plugin.=20 >> >=20 >> > What is the best way to achieve this ? Filters ? Templates ?=20 >> >=20 >> > Any help welcome!=20 >> >=20 >> > --=20 >> > You received this message because you are subscribed to the Google=20 >> Groups=20 >> > "pandoc-discuss" group.=20 >> > To unsubscribe from this group and stop receiving emails from it, send= =20 >> an email=20 >> > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org=20 >> > To view this discussion on the web visit [2] >> https://groups.google.com/d/msgid/=20 >> > pandoc-discuss/bdc377c4-3918-4f0f-a87e-a66f9d128cc2n%40googlegroups.co= m.=20 >> >> >=20 >> > References:=20 >> >=20 >> > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org=20 >> > [2]=20 >> https://groups.google.com/d/msgid/pandoc-discuss/bdc377c4-3918-4f0f-a87e= -a66f9d128cc2n%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter=20 >> >> --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/f0b95670-24a3-4870-842f-fb6e7791a694n%40googlegroups.com. ------=_Part_1767_394707929.1687858506477 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Well=E2=80=A6 it does work but, somehow, docx+styles messes with the lists = :
    For a simple docx with just one list, unordered here is what I get wi= th -f docx+styles -t dokuwiki :
    <HTML><ul></HTML&g= t;
    <HTML><li></HTML><HTML><p>= </HTML>Liste 1<HTML></p></HTML>
    <HTML>&l= t;/li></HTML>
    <HTML><li></HTML><HTML>= <p></HTML>liste 2<HTML></p></HTML>
    <H= TML></li></HTML>
    <HTML><li></HTML><= ;HTML><p></HTML>liste 3<HTML></p></HTML>
    <HTML><ul></HTML>
    <HTML><li>&l= t;/HTML><HTML><p></HTML>liste 3a<HTML></p>= </HTML>
    <HTML></li></HTML>
    <HTML>&l= t;li></HTML><HTML><p></HTML>liste 3b<HTML>= </p></HTML>
    <HTML></li></HTML>
    <= HTML><li></HTML><HTML><p></HTML>liste 3c&l= t;HTML></p></HTML>
    <HTML></li></HTML>= <HTML></ul></HTML>
    <HTML></li></HTML&= gt;
    <HTML><li></HTML><HTML><p></HTML&= gt;liste 4<HTML></p></HTML>
    <HTML></li>&= lt;/HTML><HTML></ul></HTML>

    Whic= h is not parsed by dokuwiki.


    Without +styl= es :
    =C2=A0 * Liste 1
    =C2=A0 * liste 2
    =C2= =A0 * liste 3
    =C2=A0 =C2=A0 * liste 3a
    =C2=A0 =C2=A0 * liste 3b=C2=A0 =C2=A0 * liste 3c
    =C2=A0 * liste 4

    Which is syntactically correct dokuwiki format.

    If I understand it well, Pandoc seems to consider an ordered list badly = formatted only when +styles is applied and it spits out some raw html with = <p> tags inside <li>s

    So what is it = ? Bad implementation in Dokuwiki writer ?=C2=A0
    How can I benefit= from both +styles, with my lua filter, and lists ?=C2=A0

    =
    --
    =C2=A0 Pascal
    Le lundi 26 juin 2023 =C3=A0 16:04:17 UTC+2,= Sigismond a =C3=A9crit=C2=A0:
    Thanks a lot Bastien, it works perfectly well.

    Le lundi 26 = juin 2023 =C3=A0 15:47:00 UTC+2, Bastien DUMONT a =C3=A9crit=C2=A0:
    With `-f docx+styles`, you ca= n replace the divs with custom styles with this kind of filter:

    ```
    function Div (div)
    local custom_style =3D div.attributes['custom-style']
    if custom_style then
    local pre =3D pandoc.RawBlock('dokuwiki', '<WARP &qu= ot;' .. custom_style .. '">')
    local post =3D pandoc.RawBlock('dokuwiki', '</WARP&g= t;')
    local content =3D div.content
    table.insert(content, 1, pre)
    table.insert(content, post)
    return content
    end
    end
    ```

    Le Monday 26 June 2023 =C3=A0 06:16:48AM, Sigismond a =C3=A9crit :
    > OK, let's try it another way :
    >=20
    > I plan to use Pandoc to convert several docx files to dokuwiki for= mat.
    > I need to retain custom block styles and convert them to custom ta= gs, something
    > like=C2=A0
    >=20
    > <WARP my-custom-block-style>
    > my dokuwiki formatted block text
    > </WARP>
    >=20
    > Do I need to develop a custom dokuwiki writer from scratch to do t= hat or is
    > there a way to use lua filters for this purpose.
    > Sorry if the answer is obvious but I struggle to find relevant inf= ormation.
    >=20
    > Thanks for any help,
    > --
    > =C2=A0 Pascal
    >=20
    >=20
    > Le mercredi 26 avril 2023 =C3=A0 16:14:20 UTC+2, pascal Conil-laco= ste a =C3=A9crit=C2=A0:
    >=20
    > Hi everybody,
    >=20
    > I've been using pandoc for some years to accomplish very s= traightforward
    > conversions.
    > Now that what I plan to do is a little more complex, I struggl= e to find
    > relevant information.
    >=20
    > I need to convert docx to dokuwiki and retain Word custom styl= es. I thought
    > I could use docx+styles to get custom-styles in dokuwiki files= but they
    > don't make it to the output and get stripped.
    >=20
    > I would be happy with ::: {custom-style=3D"myStyle"}= my text here:::
    >=20
    > If I could get something along these lines, I would be able to= apply some
    > other simple transformation to get to the final dokuwiki files= and treat
    > them with a plugin.
    >=20
    > What is the best way to achieve this ? Filters ? Templates ?
    >=20
    > Any help welcome!
    >=20
    > --
    > You received this message because you are subscribed to the Google= Groups
    > "pandoc-discuss" group.
    > To unsubscribe from this group and stop receiving emails from it, = send an email
    > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
    > To view this discussion on the web visit [2]https://groups.google.com/d/msgid/
    > pandoc-discuss/bdc377c4-3918-4f0f-a87e-a66f9d128cc2n%40googlegroups.com.
    >=20
    > References:
    >=20
    > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
    > [2]
    https://groups= .google.com/d/msgid/pandoc-discuss/bdc377c4-3918-4f0f-a87e-a66f9d128cc2n%40= googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter

    --
    You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
    To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
    To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/f0b95670-24a3-4870-842f-fb6e7791a694n%40googlegroups.= com.
    ------=_Part_1767_394707929.1687858506477-- ------=_Part_1766_1648830203.1687858506477--