public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Joost Kremers <joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Cc: John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
Subject: Re: Custom styles in docx to markdown conversion.
Date: Thu, 16 Dec 2021 13:28:18 +0100	[thread overview]
Message-ID: <87czlwhel7.fsf@fastmail.fm> (raw)
In-Reply-To: <yh480ka6h8t35a.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>


On Fri, Dec 10 2021, John MacFarlane wrote:
>> Does that help?
>
> Yeah, that's enough information for me.
>
> What you need to do is to write a Lua filter like this:
>
> function Div(el)
>   if el.attributes['custom-style']:match('XYZ Minor Head') then
>     return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content))
>   end
> end
>
> Hope it's clear what this does.

For some reason, it doesn't work... I tried to extend your filter to the
following:

```
function Div(el)
  if el.attributes['custom-style']:match('XYZ Major Head') then
    return pandoc.Header(1, pandoc.utils.blocks_to_inlines(el.content))
  elseif el.attributes['custom-style']:match('XYZ Minor Head') then
    return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content))
  elseif el.attributes['custom-style']:match('XYZ Body Text') then
    return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content))
  end
end
```

Using this filter, the custom style 'XYZ Body Text' is converted, but the Major
and Minor Heads are not. When I convert to native (without the filter), I don't
see a difference between Body Text on the one hand and Major or Minor Heads on
the other: both are Div elements with "custom-style" set as indicated. Only the
body text is changed, the headers are not.

Could the problem be that the header Div's tend to appear inside an OrderedList?
For some strange reason, the Major and Minor Heads don't use numbering. Instead,
each header is an item in a numbered list... Is there a way to clean up such
cases? I.e., get rid of any OrderedList that immediately contains a Major/Minor
Head, but leave "normal" OrderedLists intact?

Another question: body text in the converted document is often enclosed in a
Span with a specific custom-style. I'd like to get rid of the span, since the
style is of no interest to me, but I'm not sure what I should have the function
return. For example, the following:

```
function Span(el)
  if el.attributes['custom-style']:match('XYZ Body Text Char') then
    return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content))
  end
end
```

raises an error. I also tried converting to Plain (honestly, I don't know what
the correct type would be), and I tried just passing `el.content` to
`pandoc.Para`, but I keep getting errors. (Specifically: "Block
expected, got userdata", and also "table expected, got userdata" with Plain
instead of Para.)

I apologise for what is probably a barrage of newbie questions, but having no
previous knowledge of Lua and only a vague understanding of Pandoc's internal
data types, I have a hard time figuring things out from the documentation.

I appreciate any pointers.

TIA

-- 
Joost Kremers
Life has its moments


  parent reply	other threads:[~2021-12-16 12:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 12:07 Joost Kremers
     [not found] ` <a1fcb30f-4d0d-449b-b02e-b375f8e38abe-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>
2021-12-10 16:56   ` John MacFarlane
     [not found]     ` <m235n04cw4.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
2021-12-10 19:39       ` Joost Kremers
     [not found]         ` <877dcckzsu.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2021-12-11  0:07           ` John MacFarlane
     [not found]             ` <yh480ka6h8t35a.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2021-12-15  9:47               ` Joost Kremers
2021-12-16 12:28               ` Joost Kremers [this message]
     [not found]                 ` <87czlwhel7.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2021-12-16 14:06                   ` Bastien DUMONT

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czlwhel7.fsf@fastmail.fm \
    --to=joostkremers-97jfqw80gc6171pxa8y+qa@public.gmane.org \
    --cc=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).