For those who don't know fldChar fields, this comment from the docx parse code (parse.hs, starting on line 825) explains it:

fldChar fields work by first
having a <w:fldChar fldCharType="begin"> in a run, then a run with
<w:instrText>, then a <w:fldChar fldCharType="separate"> run, then the
content runs, and finally a <w:fldChar fldCharType="end"> run. For
example (omissions and my comments in brackets):

<w:r>
[...]
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
[...]
<w:instrText xml:space="preserve"> HYPERLINK [hyperlink url] </w:instrText>
</w:r>
<w:r>
[...]
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r w:rsidRPr=[...]>
[...]
<w:t>Foundations of Analysis, 2nd Edition</w:t>
</w:r>
<w:r>
[...]
<w:fldChar w:fldCharType="end"/>
</w:r>

The current way of parsing fldChar fields doesn't take into account that they can be nested. So the end of the nested flcChar field will be interpreted as the end of the surrounding one. This could for example lead to a hyperlink that ends too soon. See attached example for a docx that demonstrates this.

I propose to fix this by turning the fldChar state into a stack, so that a field can be started and ended inside other fields. I will include this in my pull request for PAGEREF fields that I announced here a while ago, since they are related.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a4a592f3-414e-488f-be2a-0f7fd1e0cd21n%40googlegroups.com.