public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Ben Menashe <benm5678-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: docx -> gfm with custom styles
Date: Mon, 20 Feb 2023 13:24:14 -0800 (PST)	[thread overview]
Message-ID: <32f5e6dc-baa9-4d92-a351-29bfacb7c38dn@googlegroups.com> (raw)
In-Reply-To: <085f9581-c85a-4511-ad94-ec9bca0ab8c8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 12827 bytes --]

Hi, can I pls get another bit of related advice -
1) See below implementation -- is this the cleanest approach to get it to 
handle errors? I'm trying to find a way to print out debug info only in 
case of error.
2) Is there a way to access the position of AST/input file so can print it 
in log as well?

```
return {
{
Div = function (div)
  local success, res = pcall(function ()
  if (div.attributes['custom-style'] == 'Internal Heading') then
    return pandoc.Header(2, div.content[1].content)
  end
  if (div.attributes['custom-style'] == 'Example') then
    if pandoc.utils.type(div.content[1].content) == "Blocks" then
      return pandoc.Header(2, div.content[1].content[1].content)
    else
      return pandoc.Header(2, div.content[1].content)
   end
 end

return div
end)

-- Quit processing on error
if (success ~= true) then
  print("Failed to process element!\r\nError: ", res, "\r\nElement: ", div)
  error("Processing Exception!")
else
  return res
end
end,
}
}
```

On Saturday, February 18, 2023 at 1:15:41 PM UTC-7 Ben Menashe wrote:

> Oh sorry, there are few with same string, the one it fails one shows in 
> gfm like this (w/o Lua filter):
>
> ```
> <div custom-style="Example">
>
> > *Example666:*
>
> </div>
> ```
>
> hmm, I guess I see issue, it would not work to convert as: "## > ..." -- 
> OK, perhaps header isn't the right approach for the Example style... at 
> least I have some idea now how to work with it and print out debug info. 
>  Thanks for your guidance!
>
> On Saturday, February 18, 2023 at 12:59:21 PM UTC-7 Bastien DUMONT wrote:
>
>> Sorry, I don't understand why there is a block quote in native output and 
>> not in gfm. However, has a style been applied to the "Example:" string in 
>> the DOCX file that triggers it to be parsed as a block quote? 
>>
>> Le Saturday 18 February 2023 à 11:46:56AM, Ben Menashe a écrit : 
>> > If I take out the Lua filter, I can see this is the result in md 
>> file... I'm 
>> > not sure either why it's not working. 
>> > 
>> > <div custom-style="Example"> 
>> > 
>> > *Example:* 
>> > 
>> > </div> 
>> > 
>> > On Saturday, February 18, 2023 at 12:39:39 PM UTC-7 Bastien DUMONT 
>> wrote: 
>> > 
>> > Inline formatting should be preserved without any problem. The cause of 
>> the 
>> > trouble here is is that you have a block quote inside what should be 
>> your 
>> > header, which IMO does not make sense. 
>> > 
>> > Le Saturday 18 February 2023 à 11:26:01AM, Ben Menashe a écrit : 
>> > > Hmm, yea, I see -- it's part of large original docx, so it was 
>> failing on 
>> > > another element styled as Example & had italics applied. 
>> > > I printed the div in Lua filter, when it works I see this: 
>> > > 
>> > > ``` 
>> > > 
>> > > Div ("",[],[("custom-style","Example")]) [Para [Str "Test",Space,Str 
>> > > "example"]] 
>> > > 
>> > > ``` 
>> > > 
>> > > and when fails this: 
>> > > ``` 
>> > > 
>> > > Div ("",[],[("custom-style","Example")]) [BlockQuote [Para [Emph [Str 
>> > > "Example:"]]]] 
>> > > 
>> > > ``` 
>> > > 
>> > > 
>> > > is there any clean way to approach this so it will work in a generic 
>> way 
>> > and 
>> > > preserve any other formatting applied? 
>> > > 
>> > > On Saturday, February 18, 2023 at 1:19:46 AM UTC-7 Bastien DUMONT 
>> wrote: 
>> > > 
>> > > With your examples, I get: 
>> > > 
>> > > ## Scope 
>> > > 
>> > > <div custom-style="Body Text"> 
>> > > 
>> > > Test body 
>> > > 
>> > > </div> 
>> > > 
>> > > ## Test nested 
>> > > 
>> > > Le Friday 17 February 2023 à 07:00:47AM, Ben Menashe a écrit : 
>> > > > Thank you so much...that worked - I was missing the [1].content. 
>> > > > But let's say I have another 'Example' custom style under it... w/o 
>> Lua 
>> > > filter 
>> > > > it renders this structure: 
>> > > > 
>> > > > ``` 
>> > > > <div custom-style="Internal Heading"> 
>> > > > 
>> > > > Scope 
>> > > > 
>> > > > </div> 
>> > > > 
>> > > > <div custom-style="Body Text"> 
>> > > > 
>> > > > Test body 
>> > > > 
>> > > > </div> 
>> > > > 
>> > > > <div custom-style="Example"> 
>> > > > 
>> > > > Test nested 
>> > > > 
>> > > > </div> 
>> > > > ``` 
>> > > > 
>> > > > And with filter below it fails on line 8 w/ this error "Inline, 
>> list of 
>> > > > Inlines, or string expected, got Blocks"... any idea on how to 
>> > > troubleshoot 
>> > > > such issues?: 
>> > > > 
>> > > > ``` 
>> > > > return { 
>> > > > { 
>> > > > Div = function (div) 
>> > > > if (div.attributes['custom-style'] == 'Internal Heading') then 
>> > > > return pandoc.Header(2, div.content[1].content) 
>> > > > end 
>> > > > if (div.attributes['custom-style'] == 'Example') then 
>> > > > return pandoc.Header(2, div.content[1].content) 
>> > > > end 
>> > > > 
>> > > > return div 
>> > > > end, 
>> > > > } 
>> > > > } 
>> > > > ``` 
>> > > > On Friday, February 17, 2023 at 1:10:11 AM UTC-7 Bastien DUMONT 
>> wrote: 
>> > > > 
>> > > > In this case, it would be preferable to turn the div into a Header 
>> > > element 
>> > > > and let Pandoc format it itself: 
>> > > > 
>> > > > ``` 
>> > > > function Div(div) 
>> > > > if div.attributes['custom-style'] == 'Internal Heading' then 
>> > > > return pandoc.Header(2, div.content[1].content) 
>> > > > end 
>> > > > end 
>> > > > ``` 
>> > > > 
>> > > > Le Thursday 16 February 2023 à 08:00:08PM, Ben Menashe a écrit : 
>> > > > > Hi, 
>> > > > > We have a need to convert docx to gfm. 
>> > > > > Since docx has some user defined styles we use this "+styles" 
>> > > extension: 
>> > > > > 
>> > > > > 
>> > > > > pandoc --to=gfm -f docx+styles --output=rtb.md --extract-media=. 
>> > --wrap 
>> > > = 
>> > > > none 
>> > > > > 'rtb.docx' 
>> > > > > 
>> > > > > 
>> > > > > So now we have html div that wraps our content.  Let's say I want 
>> to 
>> > > > transform 
>> > > > > this: 
>> > > > > 
>> > > > > <div custom-style="Internal Heading"> 
>> > > > > 
>> > > > > Scope 
>> > > > > 
>> > > > > </div> 
>> > > > > 
>> > > > > Into: 
>> > > > > 
>> > > > > ## Scope 
>> > > > > 
>> > > > > How can it be done? I tried to setup a Lua filter but not having 
>> > > success 
>> > > > to 
>> > > > > have it output "##" along with the div content. 
>> > > > > 
>> > > > > 
>> > > > > -- 
>> > > > > You received this message because you are subscribed to the 
>> Google 
>> > > Groups 
>> > > > > "pandoc-discuss" group. 
>> > > > > To unsubscribe from this group and stop receiving emails from it, 
>> > send 
>> > > an 
>> > > > email 
>> > > > > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org 
>> > > > > To view this discussion on the web visit [2][1][1]https:// 
>> > > [1]groups.google.com/ 
>> > > > d/msgid/ 
>> > > > > pandoc-discuss/3909f520-e8db-4cf9-900d-6a5a858c1a18n%[2] 
>> > > > [2][2]40googlegroups.com. 
>> > > > > 
>> > > > > References: 
>> > > > > 
>> > > > > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
>> > > > > [2] [3][3][3]https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > > > 3909f520-e8db-4cf9-900d-6a5a858c1a18n%[4][4]40googlegroups.com? 
>> > utm_medium= 
>> > > email& 
>> > > > utm_source=footer 
>> > > > 
>> > > > 
>> > > > -- 
>> > > > You received this message because you are subscribed to the Google 
>> > Groups 
>> > > > "pandoc-discuss" group. 
>> > > > To unsubscribe from this group and stop receiving emails from it, 
>> send 
>> > an 
>> > > email 
>> > > > to [4]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org 
>> > > > To view this discussion on the web visit [5][5][5]https:// 
>> > groups.google.com/ 
>> > > d/msgid/ 
>> > > > pandoc-discuss/52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%[6] 
>> > > [6]40googlegroups.com. 
>> > > > 
>> > > > References: 
>> > > > 
>> > > > [1] [7][7]https://groups.google.com/d/msgid/ 
>> > > > [2] [8][8]http://40googlegroups.com/ 
>> > > > [3] [9][9]https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > > 3909f520-e8db-4cf9-900d-6a5a858c1a18n%[10]
>> 40googlegroups.com?utm_medium= 
>> > email& 
>> > > utm_source=footer 
>> > > > [4] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
>> > > > [5] [10][11]https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > > 52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%[12]
>> 40googlegroups.com?utm_medium= 
>> > email& 
>> > > utm_source=footer 
>> > > 
>> > > 
>> > > -- 
>> > > You received this message because you are subscribed to the Google 
>> Groups 
>> > > "pandoc-discuss" group. 
>> > > To unsubscribe from this group and stop receiving emails from it, 
>> send an 
>> > email 
>> > > to [11]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org 
>> > > To view this discussion on the web visit [12][13]https:// 
>> > groups.google.com/d/msgid/ 
>> > > pandoc-discuss/ef5a0088-1df4-4540-98d5-a0120df8f3cen%[14] 
>> > 40googlegroups.com. 
>> > > 
>> > > References: 
>> > > 
>> > > [1] [15]https://groups.google.com/ 
>> > > [2] [16]http://40googlegroups.com/ 
>> > > [3] [17]https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > > [4] [18]http://40googlegroups.com/?utm_medium=email& 
>> > > [5] [19]https://groups.google.com/d/msgid/ 
>> > > [6] [20]http://40googlegroups.com/ 
>> > > [7] [21]https://groups.google.com/d/msgid/ 
>> > > [8] [22]http://40googlegroups.com/ 
>> > > [9] [23]https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > 3909f520-e8db-4cf9-900d-6a5a858c1a18n%
>> 40googlegroups.com?utm_medium=email& 
>> > utm_source=footer 
>> > > [10] [24]https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > 52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%
>> 40googlegroups.com?utm_medium=email& 
>> > utm_source=footer 
>> > > [11] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
>> > > [12] [25]https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > ef5a0088-1df4-4540-98d5-a0120df8f3cen%
>> 40googlegroups.com?utm_medium=email& 
>> > utm_source=footer 
>> > 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups 
>> > "pandoc-discuss" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email 
>> > to [26]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org 
>> > To view this discussion on the web visit [27]
>> https://groups.google.com/d/msgid/ 
>> > pandoc-discuss/5aeae8ad-aec8-4f00-b51c-9ffddf8c112fn%40googlegroups.com. 
>>
>> > 
>> > References: 
>> > 
>> > [1] http://groups.google.com/ 
>> > [2] http://40googlegroups.com/ 
>> > [3] https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > [4] http://40googlegroups.com/?utm_medium= 
>> > [5] https://groups.google.com/ 
>> > [6] http://40googlegroups.com/ 
>> > [7] https://groups.google.com/d/msgid/ 
>> > [8] http://40googlegroups.com/ 
>> > [9] https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > [10] http://40googlegroups.com/?utm_medium=email& 
>> > [11] https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > [12] http://40googlegroups.com/?utm_medium=email& 
>> > [13] https://groups.google.com/d/msgid/ 
>> > [14] http://40googlegroups.com/ 
>> > [15] https://groups.google.com/ 
>> > [16] http://40googlegroups.com/ 
>> > [17] https://groups.google.com/d/msgid/pandoc-discuss/ 
>> > [18] http://40googlegroups.com/?utm_medium=email& 
>> > [19] https://groups.google.com/d/msgid/ 
>> > [20] http://40googlegroups.com/ 
>> > [21] https://groups.google.com/d/msgid/ 
>> > [22] http://40googlegroups.com/ 
>> > [23] 
>> https://groups.google.com/d/msgid/pandoc-discuss/3909f520-e8db-4cf9-900d-6a5a858c1a18n%40googlegroups.com?utm_medium=email&utm_source=footer 
>> > [24] 
>> https://groups.google.com/d/msgid/pandoc-discuss/52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en%40googlegroups.com?utm_medium=email&utm_source=footer 
>> > [25] 
>> https://groups.google.com/d/msgid/pandoc-discuss/ef5a0088-1df4-4540-98d5-a0120df8f3cen%40googlegroups.com?utm_medium=email&utm_source=footer 
>> > [26] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
>> > [27] 
>> https://groups.google.com/d/msgid/pandoc-discuss/5aeae8ad-aec8-4f00-b51c-9ffddf8c112fn%40googlegroups.com?utm_medium=email&utm_source=footer 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/32f5e6dc-baa9-4d92-a351-29bfacb7c38dn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 32714 bytes --]

      parent reply	other threads:[~2023-02-20 21:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-17  4:00 Ben Menashe
     [not found] ` <3909f520-e8db-4cf9-900d-6a5a858c1a18n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-17  8:10   ` Bastien DUMONT
2023-02-17 15:00     ` Ben Menashe
     [not found]       ` <52ada5c3-e26e-4c8c-8b3f-b55bb8ce8e1en-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-18  8:19         ` Bastien DUMONT
2023-02-18 19:26           ` Ben Menashe
     [not found]             ` <ef5a0088-1df4-4540-98d5-a0120df8f3cen-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-18 19:39               ` Bastien DUMONT
2023-02-18 19:46                 ` Ben Menashe
     [not found]                   ` <5aeae8ad-aec8-4f00-b51c-9ffddf8c112fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-18 19:59                     ` Bastien DUMONT
2023-02-18 20:15                       ` Ben Menashe
     [not found]                         ` <085f9581-c85a-4511-ad94-ec9bca0ab8c8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-02-20 21:24                           ` Ben Menashe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32f5e6dc-baa9-4d92-a351-29bfacb7c38dn@googlegroups.com \
    --to=benm5678-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).