Hi Albert!

Thanks for the info! We actually have basically the same use case! Definitely will check out transpect!

Thanks,

Noah

On Thursday, June 16, 2022 at 2:21:37 AM UTC-5 Albert Krewinkel wrote:
Hi Noah,

Just chiming in to report on our experiences with tables in a project
where we used pandoc to publish journal articles. Our main goal there
was to publish HTML and PDFs from Docx inputs, with an option to handle
JATS as well (Project: <https://oa-pub.hos.tuhh.de/en/project/> Journal:
<https://kommunikation-gesellschaft.de>).

We found that authors writing in Word essentially use tables as a
graphic and layouting tool. Markdown was used as our central format,
which worked extremely well: we converted Docx -> Markdown, fixed markup
when necessary, then published via pandoc. Just tables proved
problematic. For some tables, we ended up writing separate HTML and PDFs
by hand. See the "Sonderausgabe | Podcast" in that journal for results.

This is just to say that pandoc may not be the right tool if you aim for
*fully automatic* conversion of scholarly Docx articles. Maybe tables
should just be expected to require manual tuning.

I believe that [transpect](https://transpect.github.io) tries to
preserve more of the styling, maybe it is more in line with what you
need? Citation support isn't as complete though (last I heard).

Happy to answer questions about any of this.

Cheers,
Albert


Noah Malmed <nma...@scholasticahq.com> writes:

> Hello!
>
> We use Pandoc often to convert from docx to HTML, and many of the
> documents we convert include tables. As far as we can tell, almost all
> of the table styling is lost in the docx reader. Specifically, we care
> about 5 things:
>
> 1. Text justification (left, center, or right)
>
> 2. Vertical alignment (top, middle, or bottom)
>
> 3. Text indentation
>
> 4. Cell shading and text color
>
> 5. Table borders
>
> We hope to enhance the docx reader so that these stylings get preserved
> in the AST.
>
> Proposed solutions:
>
> 1. It seems like text justification already exists in the AST through
> the Alignment value. It just needs to get implemented in the docx
> reader, as described in this issue:
> https://github.com/jgm/pandoc/issues/6316
>
> 2. Add the vertical alignment style to attributes as suggested here
>
> 3. Add text indentation to attributes in the form of the style
> padding-left
>
> 4. Add cell shading and text color to attributes in the form of the
> styles background-color and color
>
> 5. Add table borders to attributes in the form of the style border
>
>
> Does this sound like a sane and feasible solution? We're pretty
> motivated and willing to work on these changes, just want to know if
> they would be the best route!


--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ad6d8a3c-2e96-46e6-af3c-7370801f67c6n%40googlegroups.com.