From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/30740 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Noah Malmed Newsgroups: gmane.text.pandoc Subject: Re: Feature Idea: docx -> HTML table styling Date: Thu, 16 Jun 2022 08:16:25 -0700 (PDT) Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_3672_1989042223.1655392585906" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40191"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCRYTZX7UYBBBS4SVWKQMGQEZBIXMHI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Jun 16 17:16:33 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ot1-f57.google.com ([209.85.210.57]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1o1rEW-000AGw-5O for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 16 Jun 2022 17:16:32 +0200 Original-Received: by mail-ot1-f57.google.com with SMTP id y11-20020a056830108b00b0060c0fe257f5sf881394oto.6 for ; Thu, 16 Jun 2022 08:16:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=nGufflGTClFbAe2mbxkcGiFLcf4s9trZ942MEu81I3s=; b=eTnxFHbaFeezwS5k/4QCJRdetZ2jEf7wnIQVodWjy2UT20GLzdop25aSNDaN1FC4KJ H2pDruC5uMnUGXCvEGfy9x4saEiv8Hvy2Ts3BHI003X9kgMYLXSUtKLYAcJIJwDtBH2l tVvSeVHyL+AkOvntgVtxc9JC38cMIrnyrKblPHUbWeBz0RcQIJ4xEeot40GUQltKgVMb eoD4F20O2chwzncmjiRYg89G+rMdfgikpG6qUHrBrDhuKy2YKtuZyOh7FvuZwWYSkXKN bdrwy7f/DP3f7l99+aeqomZUhJGDo5XHmD57IePVazeOUW61W0NYJgvkP1J40svpF0F2 3Bdw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=scholasticahq-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=nGufflGTClFbAe2mbxkcGiFLcf4s9trZ942MEu81I3s=; b=NOdVynLABBwuYPHUCmxmluqdJtxonui8z3fIcfDlo/XutK60VxIRoXyloEOd/3lOYP h7E5/MBhQkJcdwOHUsjjPIJKdic9HHZso8MfoG/nLoJOKgLWMDRn/LbDgj/vMIninPrw CDaOaPyg55i2gBoCltgC4SqvUEWtb7TFSf66Qyfx39djSQ9pPFsS1zs7oAUdiaNOYwl9 D/w+zjR3rESByPBNCoRI0s/xszqArGhMu1RgVRI+AUAKL7H5oykZiB91ildkcYw+fSPI KSi4ILt7rHf54024Zc1OIwAGnKY0V5SUjcwLXdyONqGFB4+18Lq65Uukh6aI2cgRKxch Uq6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=nGufflGTClFbAe2mbxkcGiFLcf4s9trZ942MEu81I3s=; b=pEGt/zvgUb/UBGEaJD0kObgUDZt/+g59+/9ZiUGLxnw3g6DI/ls5VSWW9BCHvR/GxA HA+pXj5dO9WtISj4B0lZ30xGPKsbzC/QrQU18N1NNUKuGkAMMnxEYkuojdSQTahmcpQN 1hg4D1eysAp9kNlxymHNeCzHLZMQm8SZMEsok8KA8eN1qUDb0RIsmFsaH2FE184aZGjc g0MDPv/QFbzsOyIuHweWOh4he2c215wyYOV4KLFdlGMoanPdMdykRtRUXv/bBrCy2L6i 2SGG5L6eyGfyPwrfu8DclZ4OFjdjTsGlYXCy5cTify6DNJPqE4pCWY+pvcSChiZef6CW B3NA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AJIora/SlC2PSzY4+dgUJ1VaV8G4Q6Hs8nIH9bKGa7rovttThwo7w3Il r4qPn6QoZc28lOJQuZ2AKqg= X-Google-Smtp-Source: AGRyM1uXmCsTg3EG/87GcWySZEuFC0llHhFBtYgINsAtitLKYlx0LCp5dDNt+ZGI2kFoVUnJYZ8+mQ== X-Received: by 2002:a05:6808:1806:b0:32e:f2db:dd11 with SMTP id bh6-20020a056808180600b0032ef2dbdd11mr2872601oib.207.1655392589576; Thu, 16 Jun 2022 08:16:29 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6830:1aca:b0:60c:21bd:97ba with SMTP id r10-20020a0568301aca00b0060c21bd97bals394816otc.6.gmail; Thu, 16 Jun 2022 08:16:26 -0700 (PDT) X-Received: by 2002:a9d:1b68:0:b0:606:ae01:ce2e with SMTP id l95-20020a9d1b68000000b00606ae01ce2emr2204607otl.121.1655392586627; Thu, 16 Jun 2022 08:16:26 -0700 (PDT) In-Reply-To: X-Original-Sender: nmalmed-O2gogPphfo5dNrB6XyqITwC/G2K4zDHf@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:30740 Archived-At: ------=_Part_3672_1989042223.1655392585906 Content-Type: multipart/alternative; boundary="----=_Part_3673_248844423.1655392585906" ------=_Part_3673_248844423.1655392585906 Content-Type: text/plain; charset="UTF-8" Hi John, Thanks for responding! I have a few clarifying questions mainly around Attr, because I don't quite understand what values are stored. When you say adding `vertical-align` to attributes should be okay, how do you mean? Would it be more appropriate to store it in the XML format that docx uses to denote vertical alignment? As well, I think I was a little thrown off by some of the black box testing we did on the HTML reader. When we ran `pandoc -f html -t native` on the following HTML:
Noah Malmed writes: > > > Hello! > > > > We use Pandoc often to convert from docx to HTML, and many of the > > documents we convert include tables. As far as we can tell, almost all > of > > the table styling is lost in the docx reader. Specifically, we care > about 5 > > things: > > > > 1. Text justification (left, center, or right) > > > > 2. Vertical alignment (top, middle, or bottom) > > > > 3. Text indentation > > > > 4. Cell shading and text color > > > > 5. Table borders > > > > We hope to enhance the docx reader so that these stylings get preserved > in > > the AST. > > > > Proposed solutions: > > > > 1. It seems like text justification already exists in the AST through > the > > Alignment value. It just needs to get implemented in the docx reader, as > > described in this issue: https://github.com/jgm/pandoc/issues/6316 > > Correct. > > > 2. Add the vertical alignment style to attributes as suggested here > > > > Should be okay. However, adding `vertical-align` there won't do > any good for converting to HTML unless the HTML writer is > modified to be sensitive to this attribute. > > > 3. Add text indentation to attributes in the form of the style > padding-left > > You're talking about directly adding 'style' to attributes, with > CSS contents? That would make the docx reader very good for > converting to HTML and not so good for any other format. > > Note that in general pandoc does not strive to preserve every > small detail of formatting, only structure. See the beginning > of the manual. > > > 4. Add cell shading and text color to attributes in the form of the > styles > > background-color and color > > See above, also search the issue tracker for 'color'. > > > 5. Add table borders to attributes in the form of the style border > > I think this falls into the category of things that are beyond > pandoc's scope. We don't strive to reproduce all the formatting > details in conversions. Again, see the beginning of the manual. > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ec31e976-089b-4916-949a-fad874b2a8adn%40googlegroups.com. ------=_Part_3673_248844423.1655392585906 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi John,

Thanks for responding! I have a few clarifying = questions mainly around Attr, because I don't quite understand what values = are stored.

When you say adding `vertic= al-align` to attributes should be okay, how do you mean? Would it be more a= ppropriate to store it in the XML format that docx uses to denote vertical = alignment?

As well, I think I was a little thrown = off by some of the black box testing we did on the HTML reader. When we ran= `pandoc -f html -t native` on the follow= ing HTML:

<= font face=3D"Courier New"><table>
   <tbody>
      <tr>
         <td style=3D"background-col= or: green>green background</td>
      </tr>
   </tbody>
</table>

We received the following= output:

[ Table
&nb= sp;   ( "" , [] , [] )
    (Caption Nothing [])
 =   [ ( AlignDefault , ColWidthDefault ) ]
    (TableHead = ( "" , [] , [] ) [])
    [ TableBody
      &= nbsp; ( "" , [] , [] )
        (RowHeadColumns 0)        []
        [ Row
&nb= sp;           ( "" , [] , [] )
    &n= bsp;       [ Cell
          &nbs= p;     ( "" , [] , [ ( "style" , "background-color: green" ) ] )<= br>                AlignDefault
=                 (RowSpan 1)
&nbs= p;               (ColSpan 1)
  &= nbsp;             [ Plain [ Str "green" , Spa= ce , Str "background" ] ]
            ]        ]
    ]
    (TableFo= ot ( "" , [] , [] ) [])
]


See= ing that the style was preserved led us to believe that it would be appropr= iate to store some styling in the AST. Is the problem with our proposed sol= ution that we would be storing information that would be specific to HTML? = Is there maybe a more generic language that would be more appropriate to st= ore that information in?  

= Thanks,

Noah
 <= /div>


On Wednesday, June 15, 2022 at 5:36:18 PM UTC-5 John MacFar= lane wrote:
N= oah Malmed <nma...@scholastic= ahq.com> writes:

> Hello!
>
> We use Pandoc often to convert from docx to HTML, and many of the= =20
> documents we convert include tables. As far as we can tell, almost= all of=20
> the table styling is lost in the docx reader. Specifically, we car= e about 5=20
> things:
>
> 1. Text justification (left, center, or right)
>
> 2. Vertical alignment (top, middle, or bottom)
>
> 3. Text indentation
>
> 4. Cell shading and text color
>
> 5. Table borders=20
>
> We hope to enhance the docx reader so that these stylings get pres= erved in=20
> the AST.
>
> Proposed solutions:
>
> 1. It seems like text justification already exists in the AST thr= ough the=20
> Alignment value. It just needs to get implemented in the docx read= er, as=20
> described in this issue: https://github.com/jgm/pandoc/issues/6316

Correct.

> 2. Add the vertical alignment style to attributes as suggested her= e=20
> <https://github.com/jgm/pandoc/= issues/7444#issuecomment-881649125>

Should be okay. However, adding `vertical-align` there won't do
any good for converting to HTML unless the HTML writer is
modified to be sensitive to this attribute.

> 3. Add text indentation to attributes in the form of the style pad= ding-left

You're talking about directly adding 'style' to attributes,= with
CSS contents? That would make the docx reader very good for
converting to HTML and not so good for any other format.

Note that in general pandoc does not strive to preserve every
small detail of formatting, only structure. See the beginning
of the manual.

> 4. Add cell shading and text color to attributes in the form of th= e styles=20
> background-color and color

See above, also search the issue tracker for 'color'.

> 5. Add table borders to attributes in the form of the style border

I think this falls into the category of things that are beyond
pandoc's scope. We don't strive to reproduce all the formattin= g
details in conversions. Again, see the beginning of the manual.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/ec31e976-089b-4916-949a-fad874b2a8adn%40googlegroups.= com.
------=_Part_3673_248844423.1655392585906-- ------=_Part_3672_1989042223.1655392585906--