public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Pandoc improperly converts HTML formulae to .docx
@ 2022-01-14 10:37 Cassandra Lyderitz
  0 siblings, 0 replies; only message in thread
From: Cassandra Lyderitz @ 2022-01-14 10:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2016 bytes --]

I am wondering how to properly convert HTML display format formulae into MS 
Word format with pandoc so that the formulae Unicode markup would be 
interpeted.  
The minimal HTML file I am trying to convert and the output I get, as well 
as the output I would expect (which is actually, the output of the direct 
.md to .docx conversion ), are attached to the post. 

 The command I run is 
`$ pandoc  --to docx -o repro_html_to_word.docx repro_html_to_word.html`

The issue occurs both if the formulae are in inline format (then they are 
embedded into \(...\) in .docx) and the display format (they get embedded 
into \[...\], respectively) 
If I convert reversely the .docx document with formulae into .html, the 
formulae are rendered properly. 

Any help would be very much appreciated. 
---
Background: I am asking this because such conversion could finish my 
pipeline to output R Markdown documents with stargazer package regression 
results tables into MS Word. 
AFAIK currently stargazer supports output to TeX and HTML, but not to Word: 
although MS Word can render standalone HTML files (e. g. it can output the 
HTML table output), the HTML markup that gets pasted into the Markdown file 
are rendered improperly and the table is not constructed.  

I am cognizant of some other R packages that target MS Word specifically , 
notably `flextable`, `gtsummary`, `huxtable`. They however lack the 
configuration necessary for me and/or require a lot of configuration, and 
while I am exploring all available options, I would like to wonder whether 
I could find a walkaround for this obstacle.  

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/82b5190d-a7cc-4cba-9126-d7071c3d7279n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2515 bytes --]

[-- Attachment #2: repro_html_to_word_expected.docx --]
[-- Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document, Size: 16304 bytes --]

[-- Attachment #3: repro_html_to_word_real.docx --]
[-- Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document, Size: 16301 bytes --]

[-- Attachment #4: repro_html_to_word.html --]
[-- Type: text/html, Size: 5280 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-01-14 10:37 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-14 10:37 Pandoc improperly converts HTML formulae to .docx Cassandra Lyderitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).