public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Sometimes markdown output tables are HTML
@ 2022-03-21 16:49 Paul Close
       [not found] ` <fc81988f-18f9-45eb-81e9-526a5507140fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Paul Close @ 2022-03-21 16:49 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1151 bytes --]

Hi all,

I am using pandoc to translate MS Word to gfm (markdown subset) and having 
a problem where some tables are output as HTML instead of the expected 
markdown. I found if I use grid_tables, there is no problem, but due to 
needing gfm format I need to stick with pipe_tables. 

It appears to be related to table width, though I tried some options 
including --wrap=none and --columns=1000 but neither changed the output.

Is there some way I can force output of pipe_tables, even if they are ugly 
or overly long? If not, might it be possible to write a lua script to force 
table input into a pipe_table output? Ideally I'd like some way to control 
(or even know) when pandoc would decide to output HTML.

Thanks for any thoughts!

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fc81988f-18f9-45eb-81e9-526a5507140fn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1530 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Sometimes markdown output tables are HTML
       [not found] ` <fc81988f-18f9-45eb-81e9-526a5507140fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-03-22 14:55   ` Paul Close
       [not found]     ` <102127eb-7c0f-479e-9bc0-92f222412584n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Paul Close @ 2022-03-22 14:55 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1543 bytes --]

An update... I was working on small examples to reproduce and noticed the 
HTML tables appear to be caused by multiple lines in a cell, which makes 
sense why grid tables work, but pipe tables did not.

I am working around using a filter to join multiple lines with 
RawInline('html', '<br/>').

On Monday, March 21, 2022 at 11:49:48 AM UTC-5 Paul Close wrote:

> Hi all,
>
> I am using pandoc to translate MS Word to gfm (markdown subset) and having 
> a problem where some tables are output as HTML instead of the expected 
> markdown. I found if I use grid_tables, there is no problem, but due to 
> needing gfm format I need to stick with pipe_tables. 
>
> It appears to be related to table width, though I tried some options 
> including --wrap=none and --columns=1000 but neither changed the output.
>
> Is there some way I can force output of pipe_tables, even if they are ugly 
> or overly long? If not, might it be possible to write a lua script to force 
> table input into a pipe_table output? Ideally I'd like some way to control 
> (or even know) when pandoc would decide to output HTML.
>
> Thanks for any thoughts!
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/102127eb-7c0f-479e-9bc0-92f222412584n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2136 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Sometimes markdown output tables are HTML
       [not found]     ` <102127eb-7c0f-479e-9bc0-92f222412584n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-03-23 16:42       ` Paul Close
  0 siblings, 0 replies; 3+ messages in thread
From: Paul Close @ 2022-03-23 16:42 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3573 bytes --]

Perhaps the final piece of the puzzle, I found my MS Word document had some 
strange formatting that was not visible, but resulted in BlockQuotes in 
pandoc, which in turn are not valid in pipe_tables.

In case it helps someone else, here is the lua filter I wrote to get 
markdown pipe_tables output.

-- Merge multiple paragraphs by appending subsequent paragraphs to the first
-- with an HTML <br/> separator, so markdown sees them as a single (long!) 
line.
function merge_cells(row)
    for cell = 1, #row.cells do
        local cell_contents = row.cells[cell].contents
        if #cell_contents > 1 then
            -- Combine the content of all blocks into the content
            -- of the first block, then clear the remaining blocks
            cell_contents[1].content = pandoc.utils.blocks_to_inlines(
                cell_contents, { pandoc.Space(), pandoc.RawInline('html', 
"<br/>"), pandoc.Space() }) 
            for block = 2, #cell_contents do
                cell_contents[block] = nil
            end
        end
    end
end

function Table(elem)
    -- Fix cases where TableHead is empty, if so move first table row up
    if #elem.head.rows == 0 then
        local row = table.remove(elem.bodies[1].body, 1)
        table.insert(elem.head.rows, 1, row)
    end
    -- Fix cases where multiple lines appear in a table cell since gfm/pipe
    -- tables can only handle single lines. Instead use <br/> to separate.
    for row = 1, #elem.head.rows do
        merge_cells(elem.head.rows[row])
    end
    for row = 1, #elem.bodies[1].body do
        merge_cells(elem.bodies[1].body[row])
    end
    if #elem.bodies > 1 then
        print("Warning: table with " .. #elem.bodies .. " bodies.")
    end
    -- Block quotes don't work in tables, replace with italics
    return elem:walk {
        BlockQuote = function(el)
            return pandoc.Emph(pandoc.utils.stringify(el.content))
        end
    }
end

On Tuesday, March 22, 2022 at 9:55:24 AM UTC-5 Paul Close wrote:

> An update... I was working on small examples to reproduce and noticed the 
> HTML tables appear to be caused by multiple lines in a cell, which makes 
> sense why grid tables work, but pipe tables did not.
>
> I am working around using a filter to join multiple lines with 
> RawInline('html', '<br/>').
>
> On Monday, March 21, 2022 at 11:49:48 AM UTC-5 Paul Close wrote:
>
>> Hi all,
>>
>> I am using pandoc to translate MS Word to gfm (markdown subset) and 
>> having a problem where some tables are output as HTML instead of the 
>> expected markdown. I found if I use grid_tables, there is no problem, but 
>> due to needing gfm format I need to stick with pipe_tables. 
>>
>> It appears to be related to table width, though I tried some options 
>> including --wrap=none and --columns=1000 but neither changed the output.
>>
>> Is there some way I can force output of pipe_tables, even if they are 
>> ugly or overly long? If not, might it be possible to write a lua script to 
>> force table input into a pipe_table output? Ideally I'd like some way to 
>> control (or even know) when pandoc would decide to output HTML.
>>
>> Thanks for any thoughts!
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c83caefe-623c-4b2f-b69d-d05b88919ed2n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5484 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-03-23 16:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21 16:49 Sometimes markdown output tables are HTML Paul Close
     [not found] ` <fc81988f-18f9-45eb-81e9-526a5507140fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-03-22 14:55   ` Paul Close
     [not found]     ` <102127eb-7c0f-479e-9bc0-92f222412584n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-03-23 16:42       ` Paul Close

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).