No worries. Thanks for taking a look. I responded to your question in the pull request.

On Sunday, October 17, 2021 at 1:02:22 PM UTC+2 Jesse Rosenthal wrote:
Dear Milan,

Just commented on github. This looks good to me. I apologize for the long wait here, and for taking so long to turn my attention to this.

Thanks for making this work, and for sharing it with everyone else. Sorry to stand in the way of that process being a bit smoother.

Best,
Jesse

________________________________________
From: pandoc-...@googlegroups.com <pandoc-...@googlegroups.com> on behalf of Milan Bracke <milan....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Thursday, October 14, 2021 5:33 AM
To: pandoc-discuss
Subject: Re: docx parsing bug: nested fldChar fields are interpreted incorrectly

Hi Jesse,

I hope you had a good summer. Do you have time to look at my pull request in the coming weeks?
I'm now using a fork of Pandoc to have this fix and I have to rebase every time something useful is done
in the main repo, so I would really like to have this fix merged.

Best,
Milan

On Thursday, July 15, 2021 at 4:47:19 PM UTC+2 Jesse Rosenthal wrote:
Hi Milan,

Thanks for the heads up. Honestly just summer craziness: visiting family for the first time in almost two years, shuttling the kids around. Life stuff. I'll take a look at it ASAP.

Best,
Jesse

________________________________________
From: pandoc-...@googlegroups.com <pandoc-...@googlegroups.com> on behalf of Milan Bracke <milan....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Thursday, July 15, 2021 10:43 AM
To: pandoc-discuss
Subject: Re: docx parsing bug: nested fldChar fields are interpreted incorrectly

Hi all,

I've had this pull request open for more than 3 weeks now: https://github.com/jgm/pandoc/pull/7401<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjgm%2Fpandoc%2Fpull%2F7401&data=04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257392816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=quwNSpn8xQSGtRt%2BHuTTBHLOLUltz%2FZPGhmKHQ2hBL8%3D&reserved=0>
Is there a reason it's not getting any reaction? I'd be happy to improve or explain it. If I've done something
wrong, I'd like to know, so I can fix it.

Best,
Milan

On Thursday, June 17, 2021 at 8:42:48 AM UTC+2 Milan Bracke wrote:
Hi Jesse,

Thanks for the feedback. I'll ping you when making the PR. Most of my code seems to work so far, but I still
have some trouble with the fact that the fields now need to contain ParParts instead of Runs. It's harder to
match all the cases and treat them correctly. I'll try some more and let you know how it goes.

Best,
Milan
On Wednesday, June 16, 2021 at 4:21:05 PM UTC+2 Jesse Rosenthal wrote:
Hi Milan,

I wrote the original fldChar code (and that comment) and I figured it would have to evolve as further requirements became necessary. If nesting is a requirement, a stack instead of a toggle seems appropriate.

As far as crossing paragraphs goes -- your approach seems right (and similar to how we've dealt with similar issues like comments crossing paragraphs in docx parsing).

I'd be happy to take a look and offer comments/feedback on your code. Just make sure to ping me (@jkr) on your PRs.

Best,
Jesse

________________________________________
From: pandoc-...@googlegroups.com <pandoc-...@googlegroups.com> on behalf of Milan Bracke <milan....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Wednesday, June 16, 2021 5:33 AM
To: pandoc-discuss
Subject: Re: docx parsing bug: nested fldChar fields are interpreted incorrectly

I can't fix this without at least some feedback. It's a complex issue and the fix will take some time, so I need to at least know that my proposed solution
seems good and would be accepted if implemented correctly.

On Tuesday, June 15, 2021 at 8:38:30 AM UTC+2 Milan Bracke wrote:
I've encountered a new problem. A fldChar field can span multiple paragraphs, but it doesn't have to start at the beginning of the first one.
Because of this, a field across multiple paragraphs will merge those paragraphs.
There is no way to represent this exactly in the pandoc model I think. So my current solution is to have different fields with the same field
info in the different paragraphs. This can at least make the hyperlink fields work and I think it will work for the other fields we might add in
the future as well (I've checked the list).
What do you think about this ?

On Monday, June 14, 2021 at 9:17:13 AM UTC+2 Milan Bracke wrote:
For those who don't know fldChar fields, this comment from the docx parse code (parse.hs, starting on line 825) explains it:

fldChar fields work by first
having a <w:fldChar fldCharType="begin"> in a run, then a run with
<w:instrText>, then a <w:fldChar fldCharType="separate"> run, then the
content runs, and finally a <w:fldChar fldCharType="end"> run. For
example (omissions and my comments in brackets):

<w:r>
[...]
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
[...]
<w:instrText xml:space="preserve"> HYPERLINK [hyperlink url] </w:instrText>
</w:r>
<w:r>
[...]
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r w:rsidRPr=[...]>
[...]
<w:t>Foundations of Analysis, 2nd Edition</w:t>
</w:r>
<w:r>
[...]
<w:fldChar w:fldCharType="end"/>
</w:r>

The current way of parsing fldChar fields doesn't take into account that they can be nested. So the end of the nested flcChar field will be interpreted as the end of the surrounding one. This could for example lead to a hyperlink that ends too soon. See attached example for a docx that demonstrates this.

I propose to fix this by turning the fldChar state into a stack, so that a field can be started and ended inside other fields. I will include this in my pull request for PAGEREF fields that I announced here a while ago, since they are related.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com<mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9bdb337d-fa68-4c66-8f5c-d4fa81547953n%40googlegroups.com<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com&data=04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257392816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LUBegYzlL9%2BxTt7flRmGawdKytypKU9cbguRYa4L7GY%3D&reserved=0><https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com&data=04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637619570850832949%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=yihkvC%2B7Le7l00nWwyXnOyOmASIibuFvMDgLIXStUSc%3D&reserved=0<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com&data=04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257402781%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5AwXcUxCA%2BCpIEMk0K%2FJwmSWZlDH8dK4UQebZ7jMQKo%3D&reserved=0>><https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7C3013bb2b353d4b73a4dd08d930a9dbd8%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637594329240701072%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8fxpTInSSkpzMwmvDK0BYRHtKx%2BArUEcX7BLQoBE7qo%3D&reserved=0<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257402781%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fI%2BAchsEj1Ik%2F%2BSfY9obLEhbcmt54V27Ao9lb05k3jk%3D&reserved=0><https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637619570850842907%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=IYMIxCLKxoONu7qa9IQViRyjj%2FaOVY8x%2FuHlI2oMfXQ%3D&reserved=0<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257412739%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yEe8Bn%2BylkLhfB9skaPVxsxI0ngxAQKWjX6C04jlfWw%3D&reserved=0>>>.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com<mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/24273fbf-2ce9-4c26-886b-50d504cb7b05n%40googlegroups.com<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F24273fbf-2ce9-4c26-886b-50d504cb7b05n%2540googlegroups.com&data=04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257412739%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QuoRTqPLxb%2FS0L5JVpkdfc2Qg14lS%2FKS7Yvtzwm4cfg%3D&reserved=0><https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F24273fbf-2ce9-4c26-886b-50d504cb7b05n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637619570850852858%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2BFQ5SkpbfzgZ7yLWGIi9uTHBuMaN9nBZzA%2Ffzwt4XnU%3D&reserved=0<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F24273fbf-2ce9-4c26-886b-50d504cb7b05n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257422689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=eY8ikgAGkbiXevRpp4lXXfUT71j0tATVP%2BQuJfd%2BGE8%3D&reserved=0>>.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com<mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/50bcbdc6-8d4b-49c1-badb-f35fb968112dn%40googlegroups.com<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F50bcbdc6-8d4b-49c1-badb-f35fb968112dn%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257432648%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VihSAzBEbJslRknO%2FoJs32DAyA39iUnoN2alFLB2mfU%3D&reserved=0>.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d537ca16-e31c-4b90-ab6d-6a5e80200618n%40googlegroups.com.