public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* DOCX to HTML: Looking to Select More Precisely with CSS
@ 2019-04-11 17:48 Ken Dow
  0 siblings, 0 replies; only message in thread
From: Ken Dow @ 2019-04-11 17:48 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2148 bytes --]

Hi,

I'm looking to get more specific control over the HTML generated by Pandoc 
from DOCX input. I was hoping Word's paragraph styles would be assigned to 
output HTML block elements, along the lines of how headings can be output:

<section id="system-and-user-administration" class="level1">

I tried re-purposing unused styles supported in the Word reference doc 
<https://pandoc.org/MANUAL.html#option--reference-doc>, like Definition 
Term. I found that when Definition Term is applied to a paragraph, and that 
is followed by a Definition style paragraph, the following (very sensible) 
HTML is output:

<dl>
  <dt>Important</dt>
    <dd><p>If you are restoring from a backup created on another LXI 
network with different IP addresses, see “Areas, Zones, And Devices Lost 
From Floor Plan” on page 118 before proceeding.</p>
  </dd>
</dl>

But if a Definition Term is used on its own (i.e., not following by 
Definition), the following is output:

<div class="DefinitionTerm">
  <p>Important</p>
 </div>

This second behaviour gives me a class to select to style the output - but 
of course it's limited, not semantic, etc.

I also discovered that Word Character Styles transfer more font variations 
(e.g., Italicized, Small Caps) than when the same variations are applied in 
a Word Paragraph Style. For example, Character Style:

<p>
  <em>
    <span style="font-variant: small-caps;">Result</span>
  </em>
</p>

vs. Paragraph Style:

<p>Result</p>

Any corrections, tips, or pointers would be most welcome!

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9c193421-0638-492d-9ec0-c2086c0cc50c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 8261 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2019-04-11 17:48 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11 17:48 DOCX to HTML: Looking to Select More Precisely with CSS Ken Dow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).