Probably this deserves to be turned into an issue and related pull request if you didn't already. Ideally the pull request would include its unit test On Thursday, May 24, 2018 at 11:06:34 PM UTC+2, pando...-Mmb7MZpHnFY@public.gmane.org wrote: > > Text/Pandoc/Reader/Docx/Parse.hs handles complex fields as if the > "separate" field char and subsequent runs were obligatory, but they are > optional. > > While working on other things, I came across this snippet of docx XML > (simplified here): > > > SEQ CHAPTER \h \r 1 > > > > ECMA-376-1:2016, 17.16.2 (p. 1165), however, marks these parts as optional. > > Parsing this had the FldCharState transition from Closed -begin-> Open > -instr-> FieldInfo, and then ignoring the end. Only when a later field > contained a separator, it would continue to -sebarate-> CharContent -end-> > Closed. > > As a fix, I added a case when end is encountered in the FieldInfo state: > Fixed a bug in complex field handling: separator fields and rendition runs > are optional. > > --- > src/Text/Pandoc/Readers/Docx/Parse.hs | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/src/Text/Pandoc/Readers/Docx/Parse.hs > b/src/Text/Pandoc/Readers/Docx/Parse.hs > index 221260f42..b5226a95a 100644 > --- a/src/Text/Pandoc/Readers/Docx/Parse.hs > +++ b/src/Text/Pandoc/Readers/Docx/Parse.hs > @@ -830,9 +830,12 @@ elemToParPart ns element > FldCharClosed | fldCharType == "begin" -> do > modify $ \st -> st {stateFldCharState = FldCharOpen} > return NullParPart > - FldCharFieldInfo info | fldCharType == "separate" -> do > + FldCharFieldInfo info | fldCharType == "separate" -> do -- > optional separator before rendition > modify $ \st -> st {stateFldCharState = FldCharContent info []} > return NullParPart > + FldCharFieldInfo info | fldCharType == "end" -> do -- direct end, > without rendition > + modify $ \st -> st {stateFldCharState = FldCharClosed} > + return $ Field info [] > FldCharContent info runs | fldCharType == "end" -> do -- fxg: End > in same par > modify $ \st -> st {stateFldCharState = FldCharClosed} > return $ Field info $ reverse runs > -- > 2.11.0 > > I tested that with the current pandoc git master HEAD, versioned 2.2.1 > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7fcad0d6-d3d4-4239-8976-ac51ce85d475%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.