Hi John,

Thanks for responding! I have a few clarifying questions mainly around Attr, because I don't quite understand what values are stored.

When you say adding `vertical-align` to attributes should be okay, how do you mean? Would it be more appropriate to store it in the XML format that docx uses to denote vertical alignment?

As well, I think I was a little thrown off by some of the black box testing we did on the HTML reader. When we ran `pandoc -f html -t native` on the following HTML:

<table>
   <tbody>
      <tr>
         <td style="background-color: green>green background</td>
      </tr>
   </tbody>
</table>

We received the following output:

[ Table
    ( "" , [] , [] )
    (Caption Nothing [])
    [ ( AlignDefault , ColWidthDefault ) ]
    (TableHead ( "" , [] , [] ) [])
    [ TableBody
        ( "" , [] , [] )
        (RowHeadColumns 0)
        []
        [ Row
            ( "" , [] , [] )
            [ Cell
                ( "" , [] , [ ( "style" , "background-color: green" ) ] )
                AlignDefault
                (RowSpan 1)
                (ColSpan 1)
                [ Plain [ Str "green" , Space , Str "background" ] ]
            ]
        ]
    ]
    (TableFoot ( "" , [] , [] ) [])
]


Seeing that the style was preserved led us to believe that it would be appropriate to store some styling in the AST. Is the problem with our proposed solution that we would be storing information that would be specific to HTML? Is there maybe a more generic language that would be more appropriate to store that information in?  

Thanks,

Noah
 


On Wednesday, June 15, 2022 at 5:36:18 PM UTC-5 John MacFarlane wrote:
Noah Malmed <nma...@scholasticahq.com> writes:

> Hello!
>
> We use Pandoc often to convert from docx to HTML, and many of the
> documents we convert include tables. As far as we can tell, almost all of
> the table styling is lost in the docx reader. Specifically, we care about 5
> things:
>
> 1. Text justification (left, center, or right)
>
> 2. Vertical alignment (top, middle, or bottom)
>
> 3. Text indentation
>
> 4. Cell shading and text color
>
> 5. Table borders
>
> We hope to enhance the docx reader so that these stylings get preserved in
> the AST.
>
> Proposed solutions:
>
> 1. It seems like text justification already exists in the AST through the
> Alignment value. It just needs to get implemented in the docx reader, as
> described in this issue: https://github.com/jgm/pandoc/issues/6316

Correct.

> 2. Add the vertical alignment style to attributes as suggested here
> <https://github.com/jgm/pandoc/issues/7444#issuecomment-881649125>

Should be okay. However, adding `vertical-align` there won't do
any good for converting to HTML unless the HTML writer is
modified to be sensitive to this attribute.

> 3. Add text indentation to attributes in the form of the style padding-left

You're talking about directly adding 'style' to attributes, with
CSS contents? That would make the docx reader very good for
converting to HTML and not so good for any other format.

Note that in general pandoc does not strive to preserve every
small detail of formatting, only structure. See the beginning
of the manual.

> 4. Add cell shading and text color to attributes in the form of the styles
> background-color and color

See above, also search the issue tracker for 'color'.

> 5. Add table borders to attributes in the form of the style border

I think this falls into the category of things that are beyond
pandoc's scope. We don't strive to reproduce all the formatting
details in conversions. Again, see the beginning of the manual.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ec31e976-089b-4916-949a-fad874b2a8adn%40googlegroups.com.