Hi all, I've had this pull request open for more than 3 weeks now: https://github.com/jgm/pandoc/pull/7401 Is there a reason it's not getting any reaction? I'd be happy to improve or explain it. If I've done something wrong, I'd like to know, so I can fix it. Best, Milan On Thursday, June 17, 2021 at 8:42:48 AM UTC+2 Milan Bracke wrote: > Hi Jesse, > > Thanks for the feedback. I'll ping you when making the PR. Most of my code > seems to work so far, but I still > have some trouble with the fact that the fields now need to contain > ParParts instead of Runs. It's harder to > match all the cases and treat them correctly. I'll try some more and let > you know how it goes. > > Best, > Milan > On Wednesday, June 16, 2021 at 4:21:05 PM UTC+2 Jesse Rosenthal wrote: > >> Hi Milan, >> >> I wrote the original fldChar code (and that comment) and I figured it >> would have to evolve as further requirements became necessary. If nesting >> is a requirement, a stack instead of a toggle seems appropriate. >> >> As far as crossing paragraphs goes -- your approach seems right (and >> similar to how we've dealt with similar issues like comments crossing >> paragraphs in docx parsing). >> >> I'd be happy to take a look and offer comments/feedback on your code. >> Just make sure to ping me (@jkr) on your PRs. >> >> Best, >> Jesse >> >> ________________________________________ >> From: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org on >> behalf of Milan Bracke >> Sent: Wednesday, June 16, 2021 5:33 AM >> To: pandoc-discuss >> Subject: Re: docx parsing bug: nested fldChar fields are interpreted >> incorrectly >> >> I can't fix this without at least some feedback. It's a complex issue and >> the fix will take some time, so I need to at least know that my proposed >> solution >> seems good and would be accepted if implemented correctly. >> >> On Tuesday, June 15, 2021 at 8:38:30 AM UTC+2 Milan Bracke wrote: >> I've encountered a new problem. A fldChar field can span multiple >> paragraphs, but it doesn't have to start at the beginning of the first one. >> Because of this, a field across multiple paragraphs will merge those >> paragraphs. >> There is no way to represent this exactly in the pandoc model I think. So >> my current solution is to have different fields with the same field >> info in the different paragraphs. This can at least make the hyperlink >> fields work and I think it will work for the other fields we might add in >> the future as well (I've checked the list). >> What do you think about this ? >> >> On Monday, June 14, 2021 at 9:17:13 AM UTC+2 Milan Bracke wrote: >> For those who don't know fldChar fields, this comment from the docx parse >> code (parse.hs, starting on line 825) explains it: >> >> fldChar fields work by first >> having a in a run, then a run with >> , then a run, then the >> content runs, and finally a run. For >> example (omissions and my comments in brackets): >> >> >> [...] >> >> >> >> [...] >> HYPERLINK [hyperlink url] >> >> >> >> [...] >> >> >> >> [...] >> Foundations of Analysis, 2nd Edition >> >> >> [...] >> >> >> >> The current way of parsing fldChar fields doesn't take into account that >> they can be nested. So the end of the nested flcChar field will be >> interpreted as the end of the surrounding one. This could for example lead >> to a hyperlink that ends too soon. See attached example for a docx that >> demonstrates this. >> >> I propose to fix this by turning the fldChar state into a stack, so that >> a field can be started and ended inside other fields. I will include this >> in my pull request for PAGEREF fields that I announced here a while ago, >> since they are related. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/9bdb337d-fa68-4c66-8f5c-d4fa81547953n%40googlegroups.com >> < >> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7C3013bb2b353d4b73a4dd08d930a9dbd8%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637594329240701072%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8fxpTInSSkpzMwmvDK0BYRHtKx%2BArUEcX7BLQoBE7qo%3D&reserved=0>. >> >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/24273fbf-2ce9-4c26-886b-50d504cb7b05n%40googlegroups.com.