From: Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: docx+styles to dokuwiki somehow ?
Date: Tue, 27 Jun 2023 09:53:42 +0000 [thread overview]
Message-ID: <ZJqxpgF3fu2oa_vm@localhost> (raw)
In-Reply-To: <f0b95670-24a3-4870-842f-fb6e7791a694n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
I think that it is worth a bug report if it has not been done yet. As a workaround, you can expand the filter to remove all divs with custom-style from the bullet lists.
```
function Div (div)
local custom_style = div.attributes['custom-style']
if custom_style then
local pre = pandoc.RawBlock('dokuwiki', '<WARP "' .. custom_style .. '">')
local post = pandoc.RawBlock('dokuwiki', '</WARP>')
table.insert(div.content, post)
table.insert(div.content, 1, pre)
return div.content
end
end
local remove_custom_styles = {
Div = function(div)
if div.attributes['custom-style'] then
return div.content
end
end
}
function BulletList(list)
-- Do the same for all types that are badly handled with docx+styles
-- (e.g. OrderedList)
return list:walk(remove_custom_styles)
end
return {
-- We must process the bullet lists first to remove the divs
-- before they are converted to raw code.
{ BulletList = BulletList },
{ Div = Div }
}
```
Le Tuesday 27 June 2023 à 02:35:06AM, Sigismond a écrit :
> Well… it does work but, somehow, docx+styles messes with the lists :
> For a simple docx with just one list, unordered here is what I get with -f
> docx+styles -t dokuwiki :
> <HTML><ul></HTML>
> <HTML><li></HTML><HTML><p></HTML>Liste 1<HTML></p></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 2<HTML></p></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 3<HTML></p></HTML>
>
> <HTML><ul></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 3a<HTML></p></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 3b<HTML></p></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 3c<HTML></p></HTML>
> <HTML></li></HTML><HTML></ul></HTML>
> <HTML></li></HTML>
> <HTML><li></HTML><HTML><p></HTML>liste 4<HTML></p></HTML>
> <HTML></li></HTML><HTML></ul></HTML>
>
> Which is not parsed by dokuwiki.
>
>
> Without +styles :
> * Liste 1
> * liste 2
> * liste 3
> * liste 3a
> * liste 3b
> * liste 3c
> * liste 4
>
> Which is syntactically correct dokuwiki format.
>
> If I understand it well, Pandoc seems to consider an ordered list badly
> formatted only when +styles is applied and it spits out some raw html with <p>
> tags inside <li>s
>
> So what is it ? Bad implementation in Dokuwiki writer ?
> How can I benefit from both +styles, with my lua filter, and lists ?
>
> --
> Pascal
> Le lundi 26 juin 2023 à 16:04:17 UTC+2, Sigismond a écrit :
>
> Thanks a lot Bastien, it works perfectly well.
>
> Le lundi 26 juin 2023 à 15:47:00 UTC+2, Bastien DUMONT a écrit :
>
> With `-f docx+styles`, you can replace the divs with custom styles with
> this kind of filter:
>
> ```
> function Div (div)
> local custom_style = div.attributes['custom-style']
> if custom_style then
> local pre = pandoc.RawBlock('dokuwiki', '<WARP "' .. custom_style ..
> '">')
> local post = pandoc.RawBlock('dokuwiki', '</WARP>')
> local content = div.content
> table.insert(content, 1, pre)
> table.insert(content, post)
> return content
> end
> end
> ```
>
> Le Monday 26 June 2023 à 06:16:48AM, Sigismond a écrit :
> > OK, let's try it another way :
> >
> > I plan to use Pandoc to convert several docx files to dokuwiki
> format.
> > I need to retain custom block styles and convert them to custom tags,
> something
> > like
> >
> > <WARP my-custom-block-style>
> > my dokuwiki formatted block text
> > </WARP>
> >
> > Do I need to develop a custom dokuwiki writer from scratch to do that
> or is
> > there a way to use lua filters for this purpose.
> > Sorry if the answer is obvious but I struggle to find relevant
> information.
> >
> > Thanks for any help,
> > --
> > Pascal
> >
> >
> > Le mercredi 26 avril 2023 à 16:14:20 UTC+2, pascal Conil-lacoste a
> écrit :
> >
> > Hi everybody,
> >
> > I've been using pandoc for some years to accomplish very
> straightforward
> > conversions.
> > Now that what I plan to do is a little more complex, I struggle to
> find
> > relevant information.
> >
> > I need to convert docx to dokuwiki and retain Word custom styles. I
> thought
> > I could use docx+styles to get custom-styles in dokuwiki files but
> they
> > don't make it to the output and get stripped.
> >
> > I would be happy with ::: {custom-style="myStyle"} my text here:::
> >
> > If I could get something along these lines, I would be able to apply
> some
> > other simple transformation to get to the final dokuwiki files and
> treat
> > them with a plugin.
> >
> > What is the best way to achieve this ? Filters ? Templates ?
> >
> > Any help welcome!
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send an email
> > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit [2][1]https://
> groups.google.com/d/msgid/
> > pandoc-discuss/bdc377c4-3918-4f0f-a87e-a66f9d128cc2n%[2]
> 40googlegroups.com.
> >
> > References:
> >
> > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > [2] [3]https://groups.google.com/d/msgid/pandoc-discuss/
> bdc377c4-3918-4f0f-a87e-a66f9d128cc2n%40googlegroups.com?utm_medium=
> email&utm_source=footer
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [4]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [5]https://groups.google.com/d/msgid/
> pandoc-discuss/f0b95670-24a3-4870-842f-fb6e7791a694n%40googlegroups.com.
>
> References:
>
> [1] https://groups.google.com/d/msgid/
> [2] http://40googlegroups.com/
> [3] https://groups.google.com/d/msgid/pandoc-discuss/bdc377c4-3918-4f0f-a87e-a66f9d128cc2n%40googlegroups.com?utm_medium=email&utm_source=footer
> [4] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [5] https://groups.google.com/d/msgid/pandoc-discuss/f0b95670-24a3-4870-842f-fb6e7791a694n%40googlegroups.com?utm_medium=email&utm_source=footer
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ZJqxpgF3fu2oa_vm%40localhost.
next prev parent reply other threads:[~2023-06-27 9:53 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-26 14:14 pascal Conil-lacoste
[not found] ` <16df0de5-a608-4e6e-9545-3fa338229d8fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-26 13:16 ` Sigismond
[not found] ` <bdc377c4-3918-4f0f-a87e-a66f9d128cc2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-26 13:46 ` Bastien DUMONT
2023-06-26 14:04 ` Sigismond
[not found] ` <d22b9383-2891-44f7-8f4a-1867eef83fe2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-27 9:35 ` Sigismond
[not found] ` <f0b95670-24a3-4870-842f-fb6e7791a694n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-27 9:53 ` Bastien DUMONT [this message]
2023-06-27 10:21 ` Sigismond
[not found] ` <a62eaa45-0126-4325-878e-4dae06aba21an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-28 15:00 ` Sigismond
[not found] ` <62b0db64-b7ab-48e8-9025-9c969304e1b6n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-06-28 16:52 ` Bastien DUMONT
2023-06-29 13:11 ` Sigismond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZJqxpgF3fu2oa_vm@localhost \
--to=bastien.dumont-vwifzpto/vqstnjn9+bgxg@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).