Hi Jesse, Thanks for the feedback. I'll ping you when making the PR. Most of my code seems to work so far, but I still have some trouble with the fact that the fields now need to contain ParParts instead of Runs. It's harder to match all the cases and treat them correctly. I'll try some more and let you know how it goes. Best, Milan On Wednesday, June 16, 2021 at 4:21:05 PM UTC+2 Jesse Rosenthal wrote: > Hi Milan, > > I wrote the original fldChar code (and that comment) and I figured it > would have to evolve as further requirements became necessary. If nesting > is a requirement, a stack instead of a toggle seems appropriate. > > As far as crossing paragraphs goes -- your approach seems right (and > similar to how we've dealt with similar issues like comments crossing > paragraphs in docx parsing). > > I'd be happy to take a look and offer comments/feedback on your code. Just > make sure to ping me (@jkr) on your PRs. > > Best, > Jesse > > ________________________________________ > From: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org on behalf > of Milan Bracke > Sent: Wednesday, June 16, 2021 5:33 AM > To: pandoc-discuss > Subject: Re: docx parsing bug: nested fldChar fields are interpreted > incorrectly > > I can't fix this without at least some feedback. It's a complex issue and > the fix will take some time, so I need to at least know that my proposed > solution > seems good and would be accepted if implemented correctly. > > On Tuesday, June 15, 2021 at 8:38:30 AM UTC+2 Milan Bracke wrote: > I've encountered a new problem. A fldChar field can span multiple > paragraphs, but it doesn't have to start at the beginning of the first one. > Because of this, a field across multiple paragraphs will merge those > paragraphs. > There is no way to represent this exactly in the pandoc model I think. So > my current solution is to have different fields with the same field > info in the different paragraphs. This can at least make the hyperlink > fields work and I think it will work for the other fields we might add in > the future as well (I've checked the list). > What do you think about this ? > > On Monday, June 14, 2021 at 9:17:13 AM UTC+2 Milan Bracke wrote: > For those who don't know fldChar fields, this comment from the docx parse > code (parse.hs, starting on line 825) explains it: > > fldChar fields work by first > having a in a run, then a run with > , then a run, then the > content runs, and finally a run. For > example (omissions and my comments in brackets): > > > [...] > > > > [...] > HYPERLINK [hyperlink url] > > > [...] > > > > [...] > Foundations of Analysis, 2nd Edition > > > [...] > > > > The current way of parsing fldChar fields doesn't take into account that > they can be nested. So the end of the nested flcChar field will be > interpreted as the end of the surrounding one. This could for example lead > to a hyperlink that ends too soon. See attached example for a docx that > demonstrates this. > > I propose to fix this by turning the fldChar state into a stack, so that a > field can be started and ended inside other fields. I will include this in > my pull request for PAGEREF fields that I announced here a while ago, since > they are related. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/9bdb337d-fa68-4c66-8f5c-d4fa81547953n%40googlegroups.com > < > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7C3013bb2b353d4b73a4dd08d930a9dbd8%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637594329240701072%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8fxpTInSSkpzMwmvDK0BYRHtKx%2BArUEcX7BLQoBE7qo%3D&reserved=0 > >. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a8b2c5dd-ca1d-494c-86ca-e4ad90544c5an%40googlegroups.com.