public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Custom styles in docx to markdown conversion.
Date: Thu, 16 Dec 2021 14:06:19 +0000	[thread overview]
Message-ID: <YbtH2yG9dD+fbURH@localhost> (raw)
In-Reply-To: <87czlwhel7.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>

As for your second question, what you want is to get the content of the span, without the containing span element itself. So:

```
function Span(el)
  if el.attributes['custom-style']:match('XYZ Body Text Char') then
    return el.content
  end
end
```

The solution to "get rid of any OrderedList that immediately contains a Major/Minor Head, but leave "normal" OrderedLists intact" is similar: since the content of an OrderedLists is a list of lists of Blocks, you want the return the content of the only Block in the first list of Blocks. This may work:

```
function OrderedList(el)
  local possibleHeader = el.content[1][1]
  if possibleHeader.t == 'Div'
    and possibleHeader.attributes['custom-style']:match('XYZ Minor Head')
  then
    return pandoc.Header(2, pandoc.utils.blocks_to_inlines(possibleHeader.content))
  end
end
```


Le Thursday 16 December 2021 à 01:28:18PM, Joost Kremers a écrit :
> 
> On Fri, Dec 10 2021, John MacFarlane wrote:
> >> Does that help?
> >
> > Yeah, that's enough information for me.
> >
> > What you need to do is to write a Lua filter like this:
> >
> > function Div(el)
> >   if el.attributes['custom-style']:match('XYZ Minor Head') then
> >     return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content))
> >   end
> > end
> >
> > Hope it's clear what this does.
> 
> For some reason, it doesn't work... I tried to extend your filter to the
> following:
> 
> ```
> function Div(el)
>   if el.attributes['custom-style']:match('XYZ Major Head') then
>     return pandoc.Header(1, pandoc.utils.blocks_to_inlines(el.content))
>   elseif el.attributes['custom-style']:match('XYZ Minor Head') then
>     return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content))
>   elseif el.attributes['custom-style']:match('XYZ Body Text') then
>     return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content))
>   end
> end
> ```
> 
> Using this filter, the custom style 'XYZ Body Text' is converted, but the Major
> and Minor Heads are not. When I convert to native (without the filter), I don't
> see a difference between Body Text on the one hand and Major or Minor Heads on
> the other: both are Div elements with "custom-style" set as indicated. Only the
> body text is changed, the headers are not.
> 
> Could the problem be that the header Div's tend to appear inside an OrderedList?
> For some strange reason, the Major and Minor Heads don't use numbering. Instead,
> each header is an item in a numbered list... Is there a way to clean up such
> cases? I.e., get rid of any OrderedList that immediately contains a Major/Minor
> Head, but leave "normal" OrderedLists intact?
> 
> Another question: body text in the converted document is often enclosed in a
> Span with a specific custom-style. I'd like to get rid of the span, since the
> style is of no interest to me, but I'm not sure what I should have the function
> return. For example, the following:
> 
> ```
> function Span(el)
>   if el.attributes['custom-style']:match('XYZ Body Text Char') then
>     return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content))
>   end
> end
> ```
> 
> raises an error. I also tried converting to Plain (honestly, I don't know what
> the correct type would be), and I tried just passing `el.content` to
> `pandoc.Para`, but I keep getting errors. (Specifically: "Block
> expected, got userdata", and also "table expected, got userdata" with Plain
> instead of Para.)
> 
> I apologise for what is probably a barrage of newbie questions, but having no
> previous knowledge of Lua and only a vague understanding of Pandoc's internal
> data types, I have a hard time figuring things out from the documentation.
> 
> I appreciate any pointers.
> 
> TIA
> 
> -- 
> Joost Kremers
> Life has its moments
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87czlwhel7.fsf%40fastmail.fm.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YbtH2yG9dD%2BfbURH%40localhost.


      parent reply	other threads:[~2021-12-16 14:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 12:07 Joost Kremers
     [not found] ` <a1fcb30f-4d0d-449b-b02e-b375f8e38abe-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>
2021-12-10 16:56   ` John MacFarlane
     [not found]     ` <m235n04cw4.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
2021-12-10 19:39       ` Joost Kremers
     [not found]         ` <877dcckzsu.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2021-12-11  0:07           ` John MacFarlane
     [not found]             ` <yh480ka6h8t35a.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2021-12-15  9:47               ` Joost Kremers
2021-12-16 12:28               ` Joost Kremers
     [not found]                 ` <87czlwhel7.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org>
2021-12-16 14:06                   ` Bastien DUMONT [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbtH2yG9dD+fbURH@localhost \
    --to=bastien.dumont-vwifzpto/vqstnjn9+bgxg@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).