public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: docx+styles to dokuwiki somehow ?
Date: Tue, 27 Jun 2023 09:53:42 +0000	[thread overview]
Message-ID: <ZJqxpgF3fu2oa_vm@localhost> (raw)
In-Reply-To: <f0b95670-24a3-4870-842f-fb6e7791a694n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>

I think that it is worth a bug report if it has not been done yet. As a workaround, you can expand the filter to remove all divs with custom-style from the bullet lists.

```
function Div (div)
  local custom_style = div.attributes['custom-style']
  if custom_style then
    local pre = pandoc.RawBlock('dokuwiki', '<WARP "' .. custom_style .. '">')
    local post = pandoc.RawBlock('dokuwiki', '</WARP>')
    table.insert(div.content, post)
    table.insert(div.content, 1, pre)
    return div.content
  end
end

local remove_custom_styles = {
  Div = function(div)
    if div.attributes['custom-style'] then
      return div.content
    end
  end
}

function BulletList(list)
  -- Do the same for all types that are badly handled with docx+styles
  -- (e.g. OrderedList)
  return list:walk(remove_custom_styles)
end

return {
  -- We must process the bullet lists first to remove the divs
  -- before they are converted to raw code.
  { BulletList = BulletList },
  { Div = Div }
}

```

Le Tuesday 27 June 2023 à 02:35:06AM, Sigismond a écrit :
> Well… it does work but, somehow, docx+styles messes with the lists :
> For a simple docx with just one list, unordered here is what I get with -f
> docx+styles -t dokuwiki :
> <HTML><ul></HTML>
> <HTML><li></HTML><HTML><p></HTML>Liste 1<HTML></p></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 2<HTML></p></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 3<HTML></p></HTML>
> 
> <HTML><ul></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 3a<HTML></p></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 3b<HTML></p></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 3c<HTML></p></HTML>
> <HTML></li></HTML><HTML></ul></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 4<HTML></p></HTML>
> <HTML></li></HTML><HTML></ul></HTML>
> 
> Which is not parsed by dokuwiki.
> 
> 
> Without +styles :
>   * Liste 1
>   * liste 2
>   * liste 3
>     * liste 3a
>     * liste 3b
>     * liste 3c
>   * liste 4
> 
> Which is syntactically correct dokuwiki format.
> 
> If I understand it well, Pandoc seems to consider an ordered list badly
> formatted only when +styles is applied and it spits out some raw html with <p>
> tags inside <li>s
> 
> So what is it ? Bad implementation in Dokuwiki writer ? 
> How can I benefit from both +styles, with my lua filter, and lists ? 
> 
> --
>   Pascal
> Le lundi 26 juin 2023 à 16:04:17 UTC+2, Sigismond a écrit :
> 
>     Thanks a lot Bastien, it works perfectly well.
> 
>     Le lundi 26 juin 2023 à 15:47:00 UTC+2, Bastien DUMONT a écrit :
> 
>         With `-f docx+styles`, you can replace the divs with custom styles with
>         this kind of filter:
> 
>         ```
>         function Div (div)
>         local custom_style = div.attributes['custom-style']
>         if custom_style then
>         local pre = pandoc.RawBlock('dokuwiki', '<WARP "' .. custom_style ..
>         '">')
>         local post = pandoc.RawBlock('dokuwiki', '</WARP>')
>         local content = div.content
>         table.insert(content, 1, pre)
>         table.insert(content, post)
>         return content
>         end
>         end
>         ```
> 
>         Le Monday 26 June 2023 à 06:16:48AM, Sigismond a écrit :
>         > OK, let's try it another way :
>         >
>         > I plan to use Pandoc to convert several docx files to dokuwiki
>         format.
>         > I need to retain custom block styles and convert them to custom tags,
>         something
>         > like 
>         >
>         > <WARP my-custom-block-style>
>         > my dokuwiki formatted block text
>         > </WARP>
>         >
>         > Do I need to develop a custom dokuwiki writer from scratch to do that
>         or is
>         > there a way to use lua filters for this purpose.
>         > Sorry if the answer is obvious but I struggle to find relevant
>         information.
>         >
>         > Thanks for any help,
>         > --
>         >   Pascal
>         >
>         >
>         > Le mercredi 26 avril 2023 à 16:14:20 UTC+2, pascal Conil-lacoste a
>         écrit :
>         >
>         > Hi everybody,
>         >
>         > I've been using pandoc for some years to accomplish very
>         straightforward
>         > conversions.
>         > Now that what I plan to do is a little more complex, I struggle to
>         find
>         > relevant information.
>         >
>         > I need to convert docx to dokuwiki and retain Word custom styles. I
>         thought
>         > I could use docx+styles to get custom-styles in dokuwiki files but
>         they
>         > don't make it to the output and get stripped.
>         >
>         > I would be happy with ::: {custom-style="myStyle"} my text here:::
>         >
>         > If I could get something along these lines, I would be able to apply
>         some
>         > other simple transformation to get to the final dokuwiki files and
>         treat
>         > them with a plugin.
>         >
>         > What is the best way to achieve this ? Filters ? Templates ?
>         >
>         > Any help welcome!
>         >
>         > --
>         > You received this message because you are subscribed to the Google
>         Groups
>         > "pandoc-discuss" group.
>         > To unsubscribe from this group and stop receiving emails from it,
>         send an email
>         > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>         > To view this discussion on the web visit [2][1]https://
>         groups.google.com/d/msgid/
>         > pandoc-discuss/bdc377c4-3918-4f0f-a87e-a66f9d128cc2n%[2]
>         40googlegroups.com.
>         >
>         > References:
>         >
>         > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>         > [2] [3]https://groups.google.com/d/msgid/pandoc-discuss/
>         bdc377c4-3918-4f0f-a87e-a66f9d128cc2n%40googlegroups.com?utm_medium=
>         email&utm_source=footer
> 
> 
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [4]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [5]https://groups.google.com/d/msgid/
> pandoc-discuss/f0b95670-24a3-4870-842f-fb6e7791a694n%40googlegroups.com.
> 
> References:
> 
> [1] https://groups.google.com/d/msgid/
> [2] http://40googlegroups.com/
> [3] https://groups.google.com/d/msgid/pandoc-discuss/bdc377c4-3918-4f0f-a87e-a66f9d128cc2n%40googlegroups.com?utm_medium=email&utm_source=footer
> [4] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [5] https://groups.google.com/d/msgid/pandoc-discuss/f0b95670-24a3-4870-842f-fb6e7791a694n%40googlegroups.com?utm_medium=email&utm_source=footer

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ZJqxpgF3fu2oa_vm%40localhost.


  parent reply	other threads:[~2023-06-27  9:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-26 14:14 pascal Conil-lacoste
     [not found] ` <16df0de5-a608-4e6e-9545-3fa338229d8fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-26 13:16   ` Sigismond
     [not found]     ` <bdc377c4-3918-4f0f-a87e-a66f9d128cc2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-26 13:46       ` Bastien DUMONT
2023-06-26 14:04         ` Sigismond
     [not found]           ` <d22b9383-2891-44f7-8f4a-1867eef83fe2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-27  9:35             ` Sigismond
     [not found]               ` <f0b95670-24a3-4870-842f-fb6e7791a694n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-27  9:53                 ` Bastien DUMONT [this message]
2023-06-27 10:21                   ` Sigismond
     [not found]                     ` <a62eaa45-0126-4325-878e-4dae06aba21an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-28 15:00                       ` Sigismond
     [not found]                         ` <62b0db64-b7ab-48e8-9025-9c969304e1b6n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-28 16:52                           ` Bastien DUMONT
2023-06-29 13:11                             ` Sigismond

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJqxpgF3fu2oa_vm@localhost \
    --to=bastien.dumont-vwifzpto/vqstnjn9+bgxg@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).