public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: docx -> gfm with custom styles
Date: Sat, 18 Feb 2023 19:59:15 +0000	[thread overview]
Message-ID: <Y/EuE+JURYRtLlNP@localhost> (raw)
In-Reply-To: <5aeae8ad-aec8-4f00-b51c-9ffddf8c112fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>

Sorry, I don't understand why there is a block quote in native output and not in gfm. However, has a style been applied to the "Example:" string in the DOCX file that triggers it to be parsed as a block quote?

Le Saturday 18 February 2023 à 11:46:56AM, Ben Menashe a écrit :
> If I take out the Lua filter, I can see this is the result in md file... I'm
> not sure either why it's not working.
> 
> <div custom-style="Example">
> 
> *Example:*
> 
> </div>
> 
> On Saturday, February 18, 2023 at 12:39:39 PM UTC-7 Bastien DUMONT wrote:
> 
>     Inline formatting should be preserved without any problem. The cause of the
>     trouble here is is that you have a block quote inside what should be your
>     header, which IMO does not make sense.
> 
>     Le Saturday 18 February 2023 à 11:26:01AM, Ben Menashe a écrit :
>     > Hmm, yea, I see -- it's part of large original docx, so it was failing on
>     > another element styled as Example & had italics applied.
>     > I printed the div in Lua filter, when it works I see this:
>     >
>     > ```
>     >
>     > Div ("",[],[("custom-style","Example")]) [Para [Str "Test",Space,Str
>     > "example"]]
>     >
>     > ```
>     >
>     > and when fails this:
>     > ```
>     >
>     > Div ("",[],[("custom-style","Example")]) [BlockQuote [Para [Emph [Str
>     > "Example:"]]]]
>     >
>     > ```
>     >
>     >
>     > is there any clean way to approach this so it will work in a generic way
>     and
>     > preserve any other formatting applied?
>     >
>     > On Saturday, February 18, 2023 at 1:19:46 AM UTC-7 Bastien DUMONT wrote:
>     >
>     > With your examples, I get:
>     >
>     > ## Scope
>     >
>     > <div custom-style="Body Text">
>     >
>     > Test body
>     >
>     > </div>
>     >
>     > ## Test nested
>     >
>     > Le Friday 17 February 2023 à 07:00:47AM, Ben Menashe a écrit :
>     > > Thank you so much...that worked - I was missing the [1].content.
>     > > But let's say I have another 'Example' custom style under it... w/o Lua
>     > filter
>     > > it renders this structure:
>     > >
>     > > ```
>     > > <div custom-style="Internal Heading">
>     > >
>     > > Scope
>     > >
>     > > </div>
>     > >
>     > > <div custom-style="Body Text">
>     > >
>     > > Test body
>     > >
>     > > </div>
>     > >
>     > > <div custom-style="Example">
>     > >
>     > > Test nested
>     > >
>     > > </div>
>     > > ```
>     > >
>     > > And with filter below it fails on line 8 w/ this error "Inline, list of
>     > > Inlines, or string expected, got Blocks"... any idea on how to
>     > troubleshoot
>     > > such issues?:
>     > >
>     > > ```
>     > > return {
>     > > {
>     > > Div = function (div)
>     > > if (div.attributes['custom-style'] == 'Internal Heading') then
>     > > return pandoc.Header(2, div.content[1].content)
>     > > end
>     > > if (div.attributes['custom-style'] == 'Example') then
>     > > return pandoc.Header(2, div.content[1].content)
>     > > end
>     > >
>     > > return div
>     > > end,
>     > > }
>     > > }
>     > > ```
>     > > On Friday, February 17, 2023 at 1:10:11 AM UTC-7 Bastien DUMONT wrote:
>     > >
>     > > In this case, it would be preferable to turn the div into a Header
>     > element
>     > > and let Pandoc format it itself:
>     > >
>     > > ```
>     > > function Div(div)
>     > > if div.attributes['custom-style'] == 'Internal Heading' then
>     > > return pandoc.Header(2, div.content[1].content)
>     > > end
>     > > end
>     > > ```
>     > >
>     > > Le Thursday 16 February 2023 à 08:00:08PM, Ben Menashe a écrit :
>     > > > Hi,
>     > > > We have a need to convert docx to gfm.
>     > > > Since docx has some user defined styles we use this "+styles"
>     > extension:
>     > > >
>     > > >
>     > > > pandoc --to=gfm -f docx+styles --output=rtb.md --extract-media=.
>     --wrap
>     > =
>     > > none
>     > > > 'rtb.docx'
>     > > >
>     > > >
>     > > > So now we have html div that wraps our content.  Let's say I want to
>     > > transform
>     > > > this:
>     > > >
>     > > > <div custom-style="Internal Heading">
>     > > >
>     > > > Scope
>     > > >
>     > > > </div>
>     > > >
>     > > > Into:
>     > > >
>     > > > ## Scope
>     > > >
>     > > > How can it be done? I tried to setup a Lua filter but not having
>     > success
>     > > to
>     > > > have it output "##" along with the div content.
>     > > >
>     > > >
>     > > > --
>     > > > You received this message because you are subscribed to the Google
>     > Groups
>     > > > "pandoc-discuss" group.
>     > > > To unsubscribe from this group and stop receiving emails from it,
>     send
>     > an
>     > > email
>     > > > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>     > > > To view this discussion on the web visit [2][1][1]https://
>     > [1]groups.google.com/
>     > > d/msgid/
>     > > > pandoc-discuss/3909f520-e8db-4cf9-900d-6a5a858c1a18n%[2]
>     > > [2][2]40googlegroups.com.
>     > > >
>     > > > References:
>     > > >
>     > > > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>     > > > [2] [3][3][3]https://groups.google.com/d/msgid/pandoc-discuss/
>     > > 3909f520-e8db-4cf9-900d-6a5a858c1a18n%[4][4]40googlegroups.com?
>     utm_medium=
>     > email&
>     > > utm_source=footer
>     > >
>     > >
>     > > --
>     > > You received this message because you are subscribed to the Google
>     Groups
>     > > "pandoc-discuss" group.
>     > > To unsubscribe from this group and stop receiving emails from it, send
>     an
>     > email
>     > > to [4]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>     > > To view this discussion on the web visit [5][5][5]https://
>     groups.google.com/
>     > d/msgid/
>     > > pandoc-discuss/52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%[6]
>     > [6]40googlegroups.com.
>     > >
>     > > References:
>     > >
>     > > [1] [7][7]https://groups.google.com/d/msgid/
>     > > [2] [8][8]http://40googlegroups.com/
>     > > [3] [9][9]https://groups.google.com/d/msgid/pandoc-discuss/
>     > 3909f520-e8db-4cf9-900d-6a5a858c1a18n%[10]40googlegroups.com?utm_medium=
>     email&
>     > utm_source=footer
>     > > [4] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>     > > [5] [10][11]https://groups.google.com/d/msgid/pandoc-discuss/
>     > 52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%[12]40googlegroups.com?utm_medium=
>     email&
>     > utm_source=footer
>     >
>     >
>     > --
>     > You received this message because you are subscribed to the Google Groups
>     > "pandoc-discuss" group.
>     > To unsubscribe from this group and stop receiving emails from it, send an
>     email
>     > to [11]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>     > To view this discussion on the web visit [12][13]https://
>     groups.google.com/d/msgid/
>     > pandoc-discuss/ef5a0088-1df4-4540-98d5-a0120df8f3cen%[14]
>     40googlegroups.com.
>     >
>     > References:
>     >
>     > [1] [15]https://groups.google.com/
>     > [2] [16]http://40googlegroups.com/
>     > [3] [17]https://groups.google.com/d/msgid/pandoc-discuss/
>     > [4] [18]http://40googlegroups.com/?utm_medium=email&
>     > [5] [19]https://groups.google.com/d/msgid/
>     > [6] [20]http://40googlegroups.com/
>     > [7] [21]https://groups.google.com/d/msgid/
>     > [8] [22]http://40googlegroups.com/
>     > [9] [23]https://groups.google.com/d/msgid/pandoc-discuss/
>     3909f520-e8db-4cf9-900d-6a5a858c1a18n%40googlegroups.com?utm_medium=email&
>     utm_source=footer
>     > [10] [24]https://groups.google.com/d/msgid/pandoc-discuss/
>     52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%40googlegroups.com?utm_medium=email&
>     utm_source=footer
>     > [11] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>     > [12] [25]https://groups.google.com/d/msgid/pandoc-discuss/
>     ef5a0088-1df4-4540-98d5-a0120df8f3cen%40googlegroups.com?utm_medium=email&
>     utm_source=footer
> 
> 
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [26]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [27]https://groups.google.com/d/msgid/
> pandoc-discuss/5aeae8ad-aec8-4f00-b51c-9ffddf8c112fn%40googlegroups.com.
> 
> References:
> 
> [1] http://groups.google.com/
> [2] http://40googlegroups.com/
> [3] https://groups.google.com/d/msgid/pandoc-discuss/
> [4] http://40googlegroups.com/?utm_medium=
> [5] https://groups.google.com/
> [6] http://40googlegroups.com/
> [7] https://groups.google.com/d/msgid/
> [8] http://40googlegroups.com/
> [9] https://groups.google.com/d/msgid/pandoc-discuss/
> [10] http://40googlegroups.com/?utm_medium=email&
> [11] https://groups.google.com/d/msgid/pandoc-discuss/
> [12] http://40googlegroups.com/?utm_medium=email&
> [13] https://groups.google.com/d/msgid/
> [14] http://40googlegroups.com/
> [15] https://groups.google.com/
> [16] http://40googlegroups.com/
> [17] https://groups.google.com/d/msgid/pandoc-discuss/
> [18] http://40googlegroups.com/?utm_medium=email&
> [19] https://groups.google.com/d/msgid/
> [20] http://40googlegroups.com/
> [21] https://groups.google.com/d/msgid/
> [22] http://40googlegroups.com/
> [23] https://groups.google.com/d/msgid/pandoc-discuss/3909f520-e8db-4cf9-900d-6a5a858c1a18n%40googlegroups.com?utm_medium=email&utm_source=footer
> [24] https://groups.google.com/d/msgid/pandoc-discuss/52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%40googlegroups.com?utm_medium=email&utm_source=footer
> [25] https://groups.google.com/d/msgid/pandoc-discuss/ef5a0088-1df4-4540-98d5-a0120df8f3cen%40googlegroups.com?utm_medium=email&utm_source=footer
> [26] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [27] https://groups.google.com/d/msgid/pandoc-discuss/5aeae8ad-aec8-4f00-b51c-9ffddf8c112fn%40googlegroups.com?utm_medium=email&utm_source=footer

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y/EuE%2BJURYRtLlNP%40localhost.


  parent reply	other threads:[~2023-02-18 19:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-17  4:00 Ben Menashe
     [not found] ` <3909f520-e8db-4cf9-900d-6a5a858c1a18n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-17  8:10   ` Bastien DUMONT
2023-02-17 15:00     ` Ben Menashe
     [not found]       ` <52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-18  8:19         ` Bastien DUMONT
2023-02-18 19:26           ` Ben Menashe
     [not found]             ` <ef5a0088-1df4-4540-98d5-a0120df8f3cen-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-18 19:39               ` Bastien DUMONT
2023-02-18 19:46                 ` Ben Menashe
     [not found]                   ` <5aeae8ad-aec8-4f00-b51c-9ffddf8c112fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-18 19:59                     ` Bastien DUMONT [this message]
2023-02-18 20:15                       ` Ben Menashe
     [not found]                         ` <085f9581-c85a-4511-ad94-ec9bca0ab8c8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-20 21:24                           ` Ben Menashe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/EuE+JURYRtLlNP@localhost \
    --to=bastien.dumont-vwifzpto/vqstnjn9+bgxg@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).