public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Cc: Noah Malmed <nmalmed-O2gogPphfo5dNrB6XyqITwC/G2K4zDHf@public.gmane.org>
Subject: Re: Feature Idea: docx -> HTML table styling
Date: Thu, 16 Jun 2022 08:49:04 +0200	[thread overview]
Message-ID: <87y1xxvzt3.fsf@zeitkraut.de> (raw)
In-Reply-To: <cf7005a8-0447-4667-acb2-c1eccbaacaden-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>

Hi Noah,

Just chiming in to report on our experiences with tables in a project
where we used pandoc to publish journal articles. Our main goal there
was to publish HTML and PDFs from Docx inputs, with an option to handle
JATS as well (Project: <https://oa-pub.hos.tuhh.de/en/project/> Journal:
<https://kommunikation-gesellschaft.de>).

We found that authors writing in Word essentially use tables as a
graphic and layouting tool. Markdown was used as our central format,
which worked extremely well: we converted Docx -> Markdown, fixed markup
when necessary, then published via pandoc. Just tables proved
problematic. For some tables, we ended up writing separate HTML and PDFs
by hand. See the "Sonderausgabe | Podcast" in that journal for results.

This is just to say that pandoc may not be the right tool if you aim for
*fully automatic* conversion of scholarly Docx articles. Maybe tables
should just be expected to require manual tuning.

I believe that [transpect](https://transpect.github.io) tries to
preserve more of the styling, maybe it is more in line with what you
need? Citation support isn't as complete though (last I heard).

Happy to answer questions about any of this.

Cheers,
Albert


Noah Malmed <nmalmed-O2gogPphfo5dNrB6XyqITwC/G2K4zDHf@public.gmane.org> writes:

> Hello!
>
>  We use Pandoc often to convert from docx to HTML, and many of the
> documents we convert include tables. As far as we can tell, almost all
> of the table styling is lost in the docx reader. Specifically, we care
> about 5 things:
>
> 1. Text justification (left, center, or right)
>
> 2. Vertical alignment (top, middle, or bottom)
>
> 3. Text indentation
>
> 4. Cell shading and text color
>
> 5. Table borders
>
> We hope to enhance the docx reader so that these stylings get preserved
> in the AST.
>
> Proposed solutions:
>
> 1. It seems like text justification  already exists in the AST through
> the Alignment value. It just needs to get implemented in the docx
> reader, as described in this issue:
> https://github.com/jgm/pandoc/issues/6316
>
> 2. Add the vertical alignment style to attributes as suggested here
>
> 3. Add text indentation to attributes in the form of the style
> padding-left
>
> 4. Add cell shading and text color to attributes in the form of the
> styles background-color and color
>
> 5. Add table borders to attributes in the form of the style border
>
>
> Does this sound like a sane and feasible solution? We're pretty
> motivated and willing to work on these changes, just want to know if
> they would be the best route!


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


  parent reply	other threads:[~2022-06-16  6:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-15 19:13 Noah Malmed
     [not found] ` <cf7005a8-0447-4667-acb2-c1eccbaacaden-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-15 22:36   ` John MacFarlane
     [not found]     ` <m2tu8l7dwk.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2022-06-16 15:16       ` Noah Malmed
     [not found]         ` <ec31e976-089b-4916-949a-fad874b2a8adn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-16 17:21           ` John MacFarlane
     [not found]             ` <m2v8t05xsf.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2022-06-17 13:56               ` Daniel Staal
     [not found]                 ` <98db6638-9fe8-90bc-8fc0-051d0307983c-Jdbf3xiKgS8@public.gmane.org>
2022-06-17 17:36                   ` John MacFarlane
     [not found]                     ` <m235g35h13.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2022-06-18  0:19                       ` ivo....-xwz7R8GQi1g@public.gmane.org
     [not found]                         ` <3dff9bb1-eed9-4252-9b72-1aa090c5865fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-21 17:42                           ` William Lupton
2022-06-16  6:49   ` Albert Krewinkel [this message]
     [not found]     ` <87y1xxvzt3.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-06-16 15:19       ` Noah Malmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1xxvzt3.fsf@zeitkraut.de \
    --to=albert+pandoc-9eawchwdxg8hfhg+jk9f0w@public.gmane.org \
    --cc=nmalmed-O2gogPphfo5dNrB6XyqITwC/G2K4zDHf@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).