Hi Jesse, I hope you had a good summer. Do you have time to look at my pull request in the coming weeks? I'm now using a fork of Pandoc to have this fix and I have to rebase every time something useful is done in the main repo, so I would really like to have this fix merged. Best, Milan On Thursday, July 15, 2021 at 4:47:19 PM UTC+2 Jesse Rosenthal wrote: > Hi Milan, > > Thanks for the heads up. Honestly just summer craziness: visiting family > for the first time in almost two years, shuttling the kids around. Life > stuff. I'll take a look at it ASAP. > > Best, > Jesse > > ________________________________________ > From: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org on behalf > of Milan Bracke > Sent: Thursday, July 15, 2021 10:43 AM > To: pandoc-discuss > Subject: Re: docx parsing bug: nested fldChar fields are interpreted > incorrectly > > Hi all, > > I've had this pull request open for more than 3 weeks now: > https://github.com/jgm/pandoc/pull/7401 > Is there a reason it's not getting any reaction? I'd be happy to improve > or explain it. If I've done something > wrong, I'd like to know, so I can fix it. > > Best, > Milan > > On Thursday, June 17, 2021 at 8:42:48 AM UTC+2 Milan Bracke wrote: > Hi Jesse, > > Thanks for the feedback. I'll ping you when making the PR. Most of my code > seems to work so far, but I still > have some trouble with the fact that the fields now need to contain > ParParts instead of Runs. It's harder to > match all the cases and treat them correctly. I'll try some more and let > you know how it goes. > > Best, > Milan > On Wednesday, June 16, 2021 at 4:21:05 PM UTC+2 Jesse Rosenthal wrote: > Hi Milan, > > I wrote the original fldChar code (and that comment) and I figured it > would have to evolve as further requirements became necessary. If nesting > is a requirement, a stack instead of a toggle seems appropriate. > > As far as crossing paragraphs goes -- your approach seems right (and > similar to how we've dealt with similar issues like comments crossing > paragraphs in docx parsing). > > I'd be happy to take a look and offer comments/feedback on your code. Just > make sure to ping me (@jkr) on your PRs. > > Best, > Jesse > > ________________________________________ > From: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org on behalf > of Milan Bracke > Sent: Wednesday, June 16, 2021 5:33 AM > To: pandoc-discuss > Subject: Re: docx parsing bug: nested fldChar fields are interpreted > incorrectly > > I can't fix this without at least some feedback. It's a complex issue and > the fix will take some time, so I need to at least know that my proposed > solution > seems good and would be accepted if implemented correctly. > > On Tuesday, June 15, 2021 at 8:38:30 AM UTC+2 Milan Bracke wrote: > I've encountered a new problem. A fldChar field can span multiple > paragraphs, but it doesn't have to start at the beginning of the first one. > Because of this, a field across multiple paragraphs will merge those > paragraphs. > There is no way to represent this exactly in the pandoc model I think. So > my current solution is to have different fields with the same field > info in the different paragraphs. This can at least make the hyperlink > fields work and I think it will work for the other fields we might add in > the future as well (I've checked the list). > What do you think about this ? > > On Monday, June 14, 2021 at 9:17:13 AM UTC+2 Milan Bracke wrote: > For those who don't know fldChar fields, this comment from the docx parse > code (parse.hs, starting on line 825) explains it: > > fldChar fields work by first > having a in a run, then a run with > , then a run, then the > content runs, and finally a run. For > example (omissions and my comments in brackets): > > > [...] > > > > [...] > HYPERLINK [hyperlink url] > > > [...] > > > > [...] > Foundations of Analysis, 2nd Edition > > > [...] > > > > The current way of parsing fldChar fields doesn't take into account that > they can be nested. So the end of the nested flcChar field will be > interpreted as the end of the surrounding one. This could for example lead > to a hyperlink that ends too soon. See attached example for a docx that > demonstrates this. > > I propose to fix this by turning the fldChar state into a stack, so that a > field can be started and ended inside other fields. I will include this in > my pull request for PAGEREF fields that I announced here a while ago, since > they are related. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/9bdb337d-fa68-4c66-8f5c-d4fa81547953n%40googlegroups.com > < > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com&data=04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637619570850832949%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=yihkvC%2B7Le7l00nWwyXnOyOmASIibuFvMDgLIXStUSc%3D&reserved=0 > >< > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7C3013bb2b353d4b73a4dd08d930a9dbd8%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637594329240701072%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8fxpTInSSkpzMwmvDK0BYRHtKx%2BArUEcX7BLQoBE7qo%3D&reserved=0 > < > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637619570850842907%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=IYMIxCLKxoONu7qa9IQViRyjj%2FaOVY8x%2FuHlI2oMfXQ%3D&reserved=0 > >>. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/24273fbf-2ce9-4c26-886b-50d504cb7b05n%40googlegroups.com > < > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F24273fbf-2ce9-4c26-886b-50d504cb7b05n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637619570850852858%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2BFQ5SkpbfzgZ7yLWGIi9uTHBuMaN9nBZzA%2Ffzwt4XnU%3D&reserved=0 > >. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/50bcbdc6-8d4b-49c1-badb-f35fb968112dn%40googlegroups.com.