public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* docx -> markdown: images-in-table extracted but not written
@ 2017-08-28 19:22 Thomas Blom
       [not found] ` <295dbf64-f431-4dfa-98fc-a1089455da59-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Thomas Blom @ 2017-08-28 19:22 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1159 bytes --]

Hello,

The attached files demonstrate an issue in which pandoc (v1.19.2.1 on OSX 
10.12) correctly extracts two images from a table but then only creates the 
table entry for one of them in the resulting markdown.

pandoc -t markdown_strict --extract_media=test table_images.docx -o test.md

In the tables_images file, the problem is noted.  In the table_images_works 
file, in which the formatting appears to be the same, the problem does not 
occur.

Can anyone explain this?  The images in all case are 300ppi png files 
embedded in Word for Mac 2011 documents.

Thanks!
Thomas Blom

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/295dbf64-f431-4dfa-98fc-a1089455da59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1741 bytes --]

[-- Attachment #2: table_images.docx --]
[-- Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document, Size: 236753 bytes --]

[-- Attachment #3: table_images_works.docx --]
[-- Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document, Size: 284658 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: docx -> markdown: images-in-table extracted but not written
       [not found] ` <295dbf64-f431-4dfa-98fc-a1089455da59-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-08-28 19:30   ` Thomas Blom
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Blom @ 2017-08-28 19:30 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2080 bytes --]

I have just solved this myself.  Tables in Word allow you to select whether 
or not "Table Headers" are used, and (I know next to nothing about Word) I 
suppose if this is turned on pandoc sees the initial row entry as header 
data.  If the next portion of the table then does not have the 
corresponding columns, the header column will be omitted.  

In the attached document that didn't work as expected, the two images that 
appear first in the table show up in the header <th> portion.  The next row 
of data is a  single caption, and only has one column.  Consequently, the 
pandoc writer does not write the entry for the second column header.

This is just a guess - but turning off the "Table Header" checkbox in the 
word document causes the second image to show up correctly in the markdown.

Thanks,
Thomas

On Monday, August 28, 2017 at 2:22:40 PM UTC-5, Thomas Blom wrote:
>
> Hello,
>
> The attached files demonstrate an issue in which pandoc (v1.19.2.1 on OSX 
> 10.12) correctly extracts two images from a table but then only creates the 
> table entry for one of them in the resulting markdown.
>
> pandoc -t markdown_strict --extract_media=test table_images.docx -o 
> test.md
>
> In the tables_images file, the problem is noted.  In the 
> table_images_works file, in which the formatting appears to be the same, 
> the problem does not occur.
>
> Can anyone explain this?  The images in all case are 300ppi png files 
> embedded in Word for Mac 2011 documents.
>
> Thanks!
> Thomas Blom
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e1c9bf0a-80ae-4acf-93c8-e77e61a35bdd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3283 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-08-28 19:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-28 19:22 docx -> markdown: images-in-table extracted but not written Thomas Blom
     [not found] ` <295dbf64-f431-4dfa-98fc-a1089455da59-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-08-28 19:30   ` Thomas Blom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).