From: Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Custom styles in docx to markdown conversion.
Date: Thu, 16 Dec 2021 14:06:19 +0000 [thread overview]
Message-ID: <YbtH2yG9dD+fbURH@localhost> (raw)
In-Reply-To: <87czlwhel7.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
As for your second question, what you want is to get the content of the span, without the containing span element itself. So:
```
function Span(el)
if el.attributes['custom-style']:match('XYZ Body Text Char') then
return el.content
end
end
```
The solution to "get rid of any OrderedList that immediately contains a Major/Minor Head, but leave "normal" OrderedLists intact" is similar: since the content of an OrderedLists is a list of lists of Blocks, you want the return the content of the only Block in the first list of Blocks. This may work:
```
function OrderedList(el)
local possibleHeader = el.content[1][1]
if possibleHeader.t == 'Div'
and possibleHeader.attributes['custom-style']:match('XYZ Minor Head')
then
return pandoc.Header(2, pandoc.utils.blocks_to_inlines(possibleHeader.content))
end
end
```
Le Thursday 16 December 2021 à 01:28:18PM, Joost Kremers a écrit :
>
> On Fri, Dec 10 2021, John MacFarlane wrote:
> >> Does that help?
> >
> > Yeah, that's enough information for me.
> >
> > What you need to do is to write a Lua filter like this:
> >
> > function Div(el)
> > if el.attributes['custom-style']:match('XYZ Minor Head') then
> > return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content))
> > end
> > end
> >
> > Hope it's clear what this does.
>
> For some reason, it doesn't work... I tried to extend your filter to the
> following:
>
> ```
> function Div(el)
> if el.attributes['custom-style']:match('XYZ Major Head') then
> return pandoc.Header(1, pandoc.utils.blocks_to_inlines(el.content))
> elseif el.attributes['custom-style']:match('XYZ Minor Head') then
> return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content))
> elseif el.attributes['custom-style']:match('XYZ Body Text') then
> return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content))
> end
> end
> ```
>
> Using this filter, the custom style 'XYZ Body Text' is converted, but the Major
> and Minor Heads are not. When I convert to native (without the filter), I don't
> see a difference between Body Text on the one hand and Major or Minor Heads on
> the other: both are Div elements with "custom-style" set as indicated. Only the
> body text is changed, the headers are not.
>
> Could the problem be that the header Div's tend to appear inside an OrderedList?
> For some strange reason, the Major and Minor Heads don't use numbering. Instead,
> each header is an item in a numbered list... Is there a way to clean up such
> cases? I.e., get rid of any OrderedList that immediately contains a Major/Minor
> Head, but leave "normal" OrderedLists intact?
>
> Another question: body text in the converted document is often enclosed in a
> Span with a specific custom-style. I'd like to get rid of the span, since the
> style is of no interest to me, but I'm not sure what I should have the function
> return. For example, the following:
>
> ```
> function Span(el)
> if el.attributes['custom-style']:match('XYZ Body Text Char') then
> return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content))
> end
> end
> ```
>
> raises an error. I also tried converting to Plain (honestly, I don't know what
> the correct type would be), and I tried just passing `el.content` to
> `pandoc.Para`, but I keep getting errors. (Specifically: "Block
> expected, got userdata", and also "table expected, got userdata" with Plain
> instead of Para.)
>
> I apologise for what is probably a barrage of newbie questions, but having no
> previous knowledge of Lua and only a vague understanding of Pandoc's internal
> data types, I have a hard time figuring things out from the documentation.
>
> I appreciate any pointers.
>
> TIA
>
> --
> Joost Kremers
> Life has its moments
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87czlwhel7.fsf%40fastmail.fm.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YbtH2yG9dD%2BfbURH%40localhost.
prev parent reply other threads:[~2021-12-16 14:06 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-10 12:07 Joost Kremers
[not found] ` <a1fcb30f-4d0d-449b-b02e-b375f8e38abe-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>
2021-12-10 16:56 ` John MacFarlane
[not found] ` <m235n04cw4.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
2021-12-10 19:39 ` Joost Kremers
[not found] ` <877dcckzsu.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2021-12-11 0:07 ` John MacFarlane
[not found] ` <yh480ka6h8t35a.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2021-12-15 9:47 ` Joost Kremers
2021-12-16 12:28 ` Joost Kremers
[not found] ` <87czlwhel7.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2021-12-16 14:06 ` Bastien DUMONT [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YbtH2yG9dD+fbURH@localhost \
--to=bastien.dumont-vwifzpto/vqstnjn9+bgxg@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).