public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Noah Malmed <nmalmed-O2gogPphfo5dNrB6XyqITwC/G2K4zDHf@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Feature Idea: docx -> HTML table styling
Date: Thu, 16 Jun 2022 08:19:41 -0700 (PDT)	[thread overview]
Message-ID: <ad6d8a3c-2e96-46e6-af3c-7370801f67c6n@googlegroups.com> (raw)
In-Reply-To: <87y1xxvzt3.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 3398 bytes --]

Hi Albert!

Thanks for the info! We actually have basically the same use case! 
Definitely will check out transpect!

Thanks,

Noah

On Thursday, June 16, 2022 at 2:21:37 AM UTC-5 Albert Krewinkel wrote:

> Hi Noah,
>
> Just chiming in to report on our experiences with tables in a project
> where we used pandoc to publish journal articles. Our main goal there
> was to publish HTML and PDFs from Docx inputs, with an option to handle
> JATS as well (Project: <https://oa-pub.hos.tuhh.de/en/project/> Journal:
> <https://kommunikation-gesellschaft.de>).
>
> We found that authors writing in Word essentially use tables as a
> graphic and layouting tool. Markdown was used as our central format,
> which worked extremely well: we converted Docx -> Markdown, fixed markup
> when necessary, then published via pandoc. Just tables proved
> problematic. For some tables, we ended up writing separate HTML and PDFs
> by hand. See the "Sonderausgabe | Podcast" in that journal for results.
>
> This is just to say that pandoc may not be the right tool if you aim for
> *fully automatic* conversion of scholarly Docx articles. Maybe tables
> should just be expected to require manual tuning.
>
> I believe that [transpect](https://transpect.github.io) tries to
> preserve more of the styling, maybe it is more in line with what you
> need? Citation support isn't as complete though (last I heard).
>
> Happy to answer questions about any of this.
>
> Cheers,
> Albert
>
>
> Noah Malmed <nma...-O2gogPphfo5dNrB6XyqITwC/G2K4zDHf@public.gmane.org> writes:
>
> > Hello!
> >
> > We use Pandoc often to convert from docx to HTML, and many of the
> > documents we convert include tables. As far as we can tell, almost all
> > of the table styling is lost in the docx reader. Specifically, we care
> > about 5 things:
> >
> > 1. Text justification (left, center, or right)
> >
> > 2. Vertical alignment (top, middle, or bottom)
> >
> > 3. Text indentation
> >
> > 4. Cell shading and text color
> >
> > 5. Table borders
> >
> > We hope to enhance the docx reader so that these stylings get preserved
> > in the AST.
> >
> > Proposed solutions:
> >
> > 1. It seems like text justification already exists in the AST through
> > the Alignment value. It just needs to get implemented in the docx
> > reader, as described in this issue:
> > https://github.com/jgm/pandoc/issues/6316
> >
> > 2. Add the vertical alignment style to attributes as suggested here
> >
> > 3. Add text indentation to attributes in the form of the style
> > padding-left
> >
> > 4. Add cell shading and text color to attributes in the form of the
> > styles background-color and color
> >
> > 5. Add table borders to attributes in the form of the style border
> >
> >
> > Does this sound like a sane and feasible solution? We're pretty
> > motivated and willing to work on these changes, just want to know if
> > they would be the best route!
>
>
> -- 
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ad6d8a3c-2e96-46e6-af3c-7370801f67c6n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5428 bytes --]

      parent reply	other threads:[~2022-06-16 15:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-15 19:13 Noah Malmed
     [not found] ` <cf7005a8-0447-4667-acb2-c1eccbaacaden-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-15 22:36   ` John MacFarlane
     [not found]     ` <m2tu8l7dwk.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2022-06-16 15:16       ` Noah Malmed
     [not found]         ` <ec31e976-089b-4916-949a-fad874b2a8adn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-16 17:21           ` John MacFarlane
     [not found]             ` <m2v8t05xsf.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2022-06-17 13:56               ` Daniel Staal
     [not found]                 ` <98db6638-9fe8-90bc-8fc0-051d0307983c-Jdbf3xiKgS8@public.gmane.org>
2022-06-17 17:36                   ` John MacFarlane
     [not found]                     ` <m235g35h13.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2022-06-18  0:19                       ` ivo....-xwz7R8GQi1g@public.gmane.org
     [not found]                         ` <3dff9bb1-eed9-4252-9b72-1aa090c5865fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-21 17:42                           ` William Lupton
2022-06-16  6:49   ` Albert Krewinkel
     [not found]     ` <87y1xxvzt3.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-06-16 15:19       ` Noah Malmed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad6d8a3c-2e96-46e6-af3c-7370801f67c6n@googlegroups.com \
    --to=nmalmed-o2gogpphfo5dnrb6xyqitwc/g2k4zdhf@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).