From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/30742 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Noah Malmed Newsgroups: gmane.text.pandoc Subject: Re: Feature Idea: docx -> HTML table styling Date: Thu, 16 Jun 2022 08:19:41 -0700 (PDT) Message-ID: References: <87y1xxvzt3.fsf@zeitkraut.de> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1860_1438148592.1655392781769" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17806"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCRYTZX7UYBBBDUUVWKQMGQEZHMXPSY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Jun 16 17:19:47 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oa1-f62.google.com ([209.85.160.62]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1o1rHd-0004R1-VC for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 16 Jun 2022 17:19:46 +0200 Original-Received: by mail-oa1-f62.google.com with SMTP id 586e51a60fabf-fb4b11f54dsf1088684fac.14 for ; Thu, 16 Jun 2022 08:19:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=uWhammuz36iGwqs78DcBBa+0KRecqV1hbGt58CG+8EY=; b=ijoZotID5ChiXMB+2dfXy21ScUdOa+w0/2bn4B2o2ejPGrcAtmdIXNokydU3F68GSA D9FAGP4aL//vyv0cVLxm+nS7LVtXWufzFy3H515ZkUL06LRvoBWF+c22SvV4NJob517N AwkudkKxlYqlN3f2xPk7xQD7tkHw68UUoqJPgjOboWJcMidzwLJComOincgBJRSPV3Y6 llOa8xyxqXMFAokwMtEh+Cl9iwtl8DAeDDQCxLrMoWP2ZOnnCqWFS7Z5D0PsSj4Ju1Ng 0fhKeYegttxc/0rXyRD2I+rRcI82EKL+f2N1TdjxTFgp8iUecDui9C3sv+PQ19LyC664 YnmA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=scholasticahq-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=uWhammuz36iGwqs78DcBBa+0KRecqV1hbGt58CG+8EY=; b=j52jTAZrc7ZnKfBjSSUPNUIqoSnc71Ibtgf1ZOjKY7TJ7AfncCbhNcWqIi+HCUttaW 873MwpCPJP4LACdTqwn0+p310FO1mwYK2KDkHDHaTw7l9vzlb1PNurwIf9AxZTc44qjn JGJ9/4kS9NwOPCjAijAzFFstknrVhmBsjs8QmZ9p1UsQ5kdgPZ/s4HgGsGey03CK0Nu1 fHb8PiPFnsAdE95gx2z2yOQz8Gl7YCJAnydFHq5XGjSM2xjPswyvukrigKilupGSezIo j29GgZV9cjUPQ3kZ2sfrAXbl4JTX7Zo4KstljFH7qBifKEes/Qq56RWvkdXm5sSLgBFp qS9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=uWhammuz36iGwqs78DcBBa+0KRecqV1hbGt58CG+8EY=; b=0Sq2Tyf58u5EkZGzcVD0zuyIbFoooORNk0F7nkSMeVb9firs67cYqho1v0+6vQJ4DK cxzU5H6zndluGVn2HSbf71Cmv0LiGDR/L7NGFnJl/O+dvA2q/ObuJ1m0Jz6wpRS/THPE S5kMfXdTDJ92fzzNcLDh36aEDaYlNbaNvbPuC0OGz+TKItM1snAtGAnv0C1NJyvH5y8N g5LlBdC9GM6Tn281+bXoK/Gw4u/MQbpgL7h2mNDAd19eeKuZA4c/anqfr4cbs8kcu1lf 7+N/cTDplFc4R8zwMv+iM6M/Y18fBjBl12XKXJl7UOfSvLEi4lwfPnDe36YyPJO1r3Vu HTyA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AJIora8yUJ4DZbr+Eq5qagzc2hxUT2ahDTM2TTv2I9qAdTS+U13wOskm JscvJGebXoOS97EAyjMOREk= X-Google-Smtp-Source: AGRyM1u8CDgYNTSlRhqxK10OKjJl6dRcK6AqrxduuWOgeeWczH6fs6iMca6/2UpXF0KGj2f6BE1VMw== X-Received: by 2002:a05:6870:4186:b0:101:17ef:d966 with SMTP id y6-20020a056870418600b0010117efd966mr2915137oac.97.1655392784966; Thu, 16 Jun 2022 08:19:44 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a54:4186:0:b0:32e:e4e5:d540 with SMTP id 6-20020a544186000000b0032ee4e5d540ls733393oiy.5.gmail; Thu, 16 Jun 2022 08:19:42 -0700 (PDT) X-Received: by 2002:a05:6808:1527:b0:32e:e4a0:d819 with SMTP id u39-20020a056808152700b0032ee4a0d819mr2955829oiw.237.1655392782423; Thu, 16 Jun 2022 08:19:42 -0700 (PDT) In-Reply-To: <87y1xxvzt3.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> X-Original-Sender: nmalmed-O2gogPphfo5dNrB6XyqITwC/G2K4zDHf@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:30742 Archived-At: ------=_Part_1860_1438148592.1655392781769 Content-Type: multipart/alternative; boundary="----=_Part_1861_1196098764.1655392781769" ------=_Part_1861_1196098764.1655392781769 Content-Type: text/plain; charset="UTF-8" Hi Albert! Thanks for the info! We actually have basically the same use case! Definitely will check out transpect! Thanks, Noah On Thursday, June 16, 2022 at 2:21:37 AM UTC-5 Albert Krewinkel wrote: > Hi Noah, > > Just chiming in to report on our experiences with tables in a project > where we used pandoc to publish journal articles. Our main goal there > was to publish HTML and PDFs from Docx inputs, with an option to handle > JATS as well (Project: Journal: > ). > > We found that authors writing in Word essentially use tables as a > graphic and layouting tool. Markdown was used as our central format, > which worked extremely well: we converted Docx -> Markdown, fixed markup > when necessary, then published via pandoc. Just tables proved > problematic. For some tables, we ended up writing separate HTML and PDFs > by hand. See the "Sonderausgabe | Podcast" in that journal for results. > > This is just to say that pandoc may not be the right tool if you aim for > *fully automatic* conversion of scholarly Docx articles. Maybe tables > should just be expected to require manual tuning. > > I believe that [transpect](https://transpect.github.io) tries to > preserve more of the styling, maybe it is more in line with what you > need? Citation support isn't as complete though (last I heard). > > Happy to answer questions about any of this. > > Cheers, > Albert > > > Noah Malmed writes: > > > Hello! > > > > We use Pandoc often to convert from docx to HTML, and many of the > > documents we convert include tables. As far as we can tell, almost all > > of the table styling is lost in the docx reader. Specifically, we care > > about 5 things: > > > > 1. Text justification (left, center, or right) > > > > 2. Vertical alignment (top, middle, or bottom) > > > > 3. Text indentation > > > > 4. Cell shading and text color > > > > 5. Table borders > > > > We hope to enhance the docx reader so that these stylings get preserved > > in the AST. > > > > Proposed solutions: > > > > 1. It seems like text justification already exists in the AST through > > the Alignment value. It just needs to get implemented in the docx > > reader, as described in this issue: > > https://github.com/jgm/pandoc/issues/6316 > > > > 2. Add the vertical alignment style to attributes as suggested here > > > > 3. Add text indentation to attributes in the form of the style > > padding-left > > > > 4. Add cell shading and text color to attributes in the form of the > > styles background-color and color > > > > 5. Add table borders to attributes in the form of the style border > > > > > > Does this sound like a sane and feasible solution? We're pretty > > motivated and willing to work on these changes, just want to know if > > they would be the best route! > > > -- > Albert Krewinkel > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ad6d8a3c-2e96-46e6-af3c-7370801f67c6n%40googlegroups.com. ------=_Part_1861_1196098764.1655392781769 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Albert!

Thanks for the info! We actually have basical= ly the same use case! Definitely will check out transpect!

Thanks,

Noah

On Thursday, June 16, 20= 22 at 2:21:37 AM UTC-5 Albert Krewinkel wrote:
Hi Noah,

Just chiming in to report on our experiences with tables in a project
where we used pandoc to publish journal articles. Our main goal there
was to publish HTML and PDFs from Docx inputs, with an option to handle
JATS as well (Project: <https://oa-pub.hos.tuhh.de/en/project/> Journal:
<https://kommuni= kation-gesellschaft.de>).

We found that authors writing in Word essentially use tables as a
graphic and layouting tool. Markdown was used as our central format,
which worked extremely well: we converted Docx -> Markdown, fixed ma= rkup
when necessary, then published via pandoc. Just tables proved
problematic. For some tables, we ended up writing separate HTML and PDF= s
by hand. See the "Sonderausgabe | Podcast" in that journal fo= r results.

This is just to say that pandoc may not be the right tool if you aim fo= r
*fully automatic* conversion of scholarly Docx articles. Maybe tables
should just be expected to require manual tuning.

I believe that [transpect](https://trans= pect.github.io) tries to
preserve more of the styling, maybe it is more in line with what you
need? Citation support isn't as complete though (last I heard).

Happy to answer questions about any of this.

Cheers,
Albert


Noah Malmed <nma...@schol= asticahq.com> writes:

> Hello!
>
> We use Pandoc often to convert from docx to HTML, and many of the
> documents we convert include tables. As far as we can tell, almost= all
> of the table styling is lost in the docx reader. Specifically, we = care
> about 5 things:
>
> 1. Text justification (left, center, or right)
>
> 2. Vertical alignment (top, middle, or bottom)
>
> 3. Text indentation
>
> 4. Cell shading and text color
>
> 5. Table borders
>
> We hope to enhance the docx reader so that these stylings get pres= erved
> in the AST.
>
> Proposed solutions:
>
> 1. It seems like text justification already exists in the AST thr= ough
> the Alignment value. It just needs to get implemented in the docx
> reader, as described in this issue:
> https:/= /github.com/jgm/pandoc/issues/6316
>
> 2. Add the vertical alignment style to attributes as suggested her= e
>
> 3. Add text indentation to attributes in the form of the style
> padding-left
>
> 4. Add cell shading and text color to attributes in the form of th= e
> styles background-color and color
>
> 5. Add table borders to attributes in the form of the style border
>
>
> Does this sound like a sane and feasible solution? We're prett= y
> motivated and willing to work on these changes, just want to know = if
> they would be the best route!


--=20
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/ad6d8a3c-2e96-46e6-af3c-7370801f67c6n%40googlegroups.= com.
------=_Part_1861_1196098764.1655392781769-- ------=_Part_1860_1438148592.1655392781769--