From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29379 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Milan Bracke Newsgroups: gmane.text.pandoc Subject: Re: docx parsing bug: nested fldChar fields are interpreted incorrectly Date: Mon, 18 Oct 2021 00:07:44 -0700 (PDT) Message-ID: References: <2f5489af-f5a9-4ea4-9155-9f85c4808756n@googlegroups.com> <9bdb337d-fa68-4c66-8f5c-d4fa81547953n@googlegroups.com> <24273fbf-2ce9-4c26-886b-50d504cb7b05n@googlegroups.com> <50bcbdc6-8d4b-49c1-badb-f35fb968112dn@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_7764_141485553.1634540864054" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36936"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDM4NA6G6UGRBQN2WSFQMGQE2N3CEAY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Oct 18 09:07:47 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ot1-f60.google.com ([209.85.210.60]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mcMkN-0009N0-7Q for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 18 Oct 2021 09:07:47 +0200 Original-Received: by mail-ot1-f60.google.com with SMTP id z15-20020a9d71cf000000b0055036817463sf10241999otj.0 for ; Mon, 18 Oct 2021 00:07:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=TJZ+jzn+tyQ01lWnW+XILH4mbZZSXOmF8Wx2C3vhVjM=; b=ckUWQJTq0R21Zbojv4LcWW3wrt47HsG6/MTbTaJqkR9MhjuD4uk4X+uD10zpaLuoH5 rzW0dVu6BPq83Me9/h9zRR7O/Kv0Ymwutd05MWWDjRpjSScfoIQeDvtMCNi7+FYcjgwq oZy3M3hiBfP+dGOxE2Q14wmXGhKxqzitd2cXB0JmqNIbI2O0hP6MeVXKWyI4TlypSamV jIV7WbCcNmU0DuQ8vF1uEEfm/1QFZjpQFn+Ue+boyaoepavCCYpHxrmK3QRddW4AbFUf 8Lw69CAmv67cUsplUf69AZ6UufQNscZssk3/O2ebK/vbVva6pubVTFo3twi3R9gSkgTP JVZw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=TJZ+jzn+tyQ01lWnW+XILH4mbZZSXOmF8Wx2C3vhVjM=; b=PXj7CITsMYNcYpnsIUm/I+5uH2fYX5T0x4gnWcaajJv8HfNi3NSXhq184MSTv5OXFh cTnxo5h4V/pO7TVeVlyVQQkgaKOK+o18qmINalR6dzDkuyL0RoSPvueXq1OH4H8KHgUb lRO9y2Bw77Xwct+uW6WTsRuujib+VBRJDyw1R/KrXVKylZ67LMiZQ1luDPz21opZpvue gmeCQnEa5UjnKGusAlmVR5BNfqmS5LC6S+mAFV5b7CSZKewuEQgMCFIS8UdMYHwZ5ky0 aw5GQ2mU4DZ1aTrpzBEnDRYpp2aaY3RnXpYU2gf+nZxo7YUfXcNG8HcY8m8jesczsZAt BYVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=TJZ+jzn+tyQ01lWnW+XILH4mbZZSXOmF8Wx2C3vhVjM=; b=vg+UZjNl5l4FP80JebDQhuvMZm+bCe5Z6onOJ2L7MdeaeVlSI7g/DAURV4kmAp0Upl P7enUUwOGf/+jXJWhDgkKYfzga9RWfE49YMY116klMvjKZoyM2+yDXQQU6FoEz+xK4Do 2cHP1zo1UziiK1dMjWYbBHTBiQfes/yJXUqV3N/xvKGlS6Puwk6oL+vcVdhzOYxwoNKg ya1/PhfVHKw+wWFaLYxAfz3Xd6FEJnDbWOwzfECAi1r1KW4U8Qd3LFiAcYXlqLOt66dr HZDe7tD3iZt9d/Eju/GXLdCNv0Oa4VT5+5lpvwG4MTR8lDnTfEoiKqmqaKiYa2Ud3dyz SMIQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM5308p8TfSOrbsZpRCjxAnjh0kTX15Lxpsf/ntAiqW9X74Mat8KFU PBzD+b+jpW4FBiZbmxpqGSA= X-Google-Smtp-Source: ABdhPJwHgwhmWzmYAJTpzDBxIqqciLcBZlcoVS+YJC6gHFgVyKdORR+BMc5xrW6yNgHFsDgjqYooww== X-Received: by 2002:a9d:7644:: with SMTP id o4mr20575134otl.270.1634540866196; Mon, 18 Oct 2021 00:07:46 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6830:2a0c:: with SMTP id y12ls4658862otu.6.gmail; Mon, 18 Oct 2021 00:07:44 -0700 (PDT) X-Received: by 2002:a9d:6346:: with SMTP id y6mr5069980otk.154.1634540864702; Mon, 18 Oct 2021 00:07:44 -0700 (PDT) In-Reply-To: X-Original-Sender: milan.bracke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29379 Archived-At: ------=_Part_7764_141485553.1634540864054 Content-Type: multipart/alternative; boundary="----=_Part_7765_1820900886.1634540864054" ------=_Part_7765_1820900886.1634540864054 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable No worries. Thanks for taking a look. I responded to your question in the= =20 pull request. On Sunday, October 17, 2021 at 1:02:22 PM UTC+2 Jesse Rosenthal wrote: > Dear Milan, > > Just commented on github. This looks good to me. I apologize for the long= =20 > wait here, and for taking so long to turn my attention to this. > > Thanks for making this work, and for sharing it with everyone else. Sorry= =20 > to stand in the way of that process being a bit smoother. > > Best, > Jesse > > ________________________________________ > From: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org on behalf= =20 > of Milan Bracke > Sent: Thursday, October 14, 2021 5:33 AM > To: pandoc-discuss > Subject: Re: docx parsing bug: nested fldChar fields are interpreted=20 > incorrectly > > Hi Jesse, > > I hope you had a good summer. Do you have time to look at my pull request= =20 > in the coming weeks? > I'm now using a fork of Pandoc to have this fix and I have to rebase ever= y=20 > time something useful is done > in the main repo, so I would really like to have this fix merged. > > Best, > Milan > > On Thursday, July 15, 2021 at 4:47:19 PM UTC+2 Jesse Rosenthal wrote: > Hi Milan, > > Thanks for the heads up. Honestly just summer craziness: visiting family= =20 > for the first time in almost two years, shuttling the kids around. Life= =20 > stuff. I'll take a look at it ASAP. > > Best, > Jesse > > ________________________________________ > From: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org on behalf= =20 > of Milan Bracke > Sent: Thursday, July 15, 2021 10:43 AM > To: pandoc-discuss > Subject: Re: docx parsing bug: nested fldChar fields are interpreted=20 > incorrectly > > Hi all, > > I've had this pull request open for more than 3 weeks now:=20 > https://github.com/jgm/pandoc/pull/7401< > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgithu= b.com%2Fjgm%2Fpandoc%2Fpull%2F7401&data=3D04%7C01%7Cjrosenthal%40jhu.edu%7C= 48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0= %7C637698008257392816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV= 2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DquwNSpn8xQSGtRt%2BHuTT= BHLOLUltz%2FZPGhmKHQ2hBL8%3D&reserved=3D0 > > > Is there a reason it's not getting any reaction? I'd be happy to improve= =20 > or explain it. If I've done something > wrong, I'd like to know, so I can fix it. > > Best, > Milan > > On Thursday, June 17, 2021 at 8:42:48 AM UTC+2 Milan Bracke wrote: > Hi Jesse, > > Thanks for the feedback. I'll ping you when making the PR. Most of my cod= e=20 > seems to work so far, but I still > have some trouble with the fact that the fields now need to contain=20 > ParParts instead of Runs. It's harder to > match all the cases and treat them correctly. I'll try some more and let= =20 > you know how it goes. > > Best, > Milan > On Wednesday, June 16, 2021 at 4:21:05 PM UTC+2 Jesse Rosenthal wrote: > Hi Milan, > > I wrote the original fldChar code (and that comment) and I figured it=20 > would have to evolve as further requirements became necessary. If nesting= =20 > is a requirement, a stack instead of a toggle seems appropriate. > > As far as crossing paragraphs goes -- your approach seems right (and=20 > similar to how we've dealt with similar issues like comments crossing=20 > paragraphs in docx parsing). > > I'd be happy to take a look and offer comments/feedback on your code. Jus= t=20 > make sure to ping me (@jkr) on your PRs. > > Best, > Jesse > > ________________________________________ > From: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org on behalf= =20 > of Milan Bracke > Sent: Wednesday, June 16, 2021 5:33 AM > To: pandoc-discuss > Subject: Re: docx parsing bug: nested fldChar fields are interpreted=20 > incorrectly > > I can't fix this without at least some feedback. It's a complex issue and= =20 > the fix will take some time, so I need to at least know that my proposed= =20 > solution > seems good and would be accepted if implemented correctly. > > On Tuesday, June 15, 2021 at 8:38:30 AM UTC+2 Milan Bracke wrote: > I've encountered a new problem. A fldChar field can span multiple=20 > paragraphs, but it doesn't have to start at the beginning of the first on= e. > Because of this, a field across multiple paragraphs will merge those=20 > paragraphs. > There is no way to represent this exactly in the pandoc model I think. So= =20 > my current solution is to have different fields with the same field > info in the different paragraphs. This can at least make the hyperlink=20 > fields work and I think it will work for the other fields we might add in > the future as well (I've checked the list). > What do you think about this ? > > On Monday, June 14, 2021 at 9:17:13 AM UTC+2 Milan Bracke wrote: > For those who don't know fldChar fields, this comment from the docx parse= =20 > code (parse.hs, starting on line 825) explains it: > > fldChar fields work by first > having a in a run, then a run with > , then a run, then the > content runs, and finally a run. For > example (omissions and my comments in brackets): > > > [...] > > > > [...] > HYPERLINK [hyperlink url] > > > [...] > > > > [...] > Foundations of Analysis, 2nd Edition > > > [...] > > > > The current way of parsing fldChar fields doesn't take into account that= =20 > they can be nested. So the end of the nested flcChar field will be=20 > interpreted as the end of the surrounding one. This could for example lea= d=20 > to a hyperlink that ends too soon. See attached example for a docx that= =20 > demonstrates this. > > I propose to fix this by turning the fldChar state into a stack, so that = a=20 > field can be started and ended inside other fields. I will include this i= n=20 > my pull request for PAGEREF fields that I announced here a while ago, sin= ce=20 > they are related. > > -- > You received this message because you are subscribed to the Google Groups= =20 > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= =20 > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/9bdb337d-fa68-4c66-8f5c-= d4fa81547953n%40googlegroups.com > < > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa815= 47953n%2540googlegroups.com&data=3D04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa1= 4005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C6376= 98008257392816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi= LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DLUBegYzlL9%2BxTt7flRmGawdKyty= pKU9cbguRYa4L7GY%3D&reserved=3D0 > >< > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa815= 47953n%2540googlegroups.com&data=3D04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef= 0818d4fa2010808d9479efa08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C6376= 19570850832949%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi= LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3DyihkvC%2B7Le7l00nWwyXnOyOmASI= ibuFvMDgLIXStUSc%3D&reserved=3D0 > < > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa815= 47953n%2540googlegroups.com&data=3D04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa1= 4005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C6376= 98008257402781%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi= LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3D5AwXcUxCA%2BCpIEMk0K%2FJwmSWZ= lDH8dK4UQebZ7jMQKo%3D&reserved=3D0 > >>< > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa815= 47953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data= =3D04%7C01%7Cjrosenthal%40jhu.edu%7C3013bb2b353d4b73a4dd08d930a9dbd8%7C9fa4= f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637594329240701072%7CUnknown%7CTWFpb= GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7= C3000&sdata=3D8fxpTInSSkpzMwmvDK0BYRHtKx%2BArUEcX7BLQoBE7qo%3D&reserved=3D0 > < > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa815= 47953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data= =3D04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4= f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257402781%7CUnknown%7CTWFpb= GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7= C1000&sdata=3DfI%2BAchsEj1Ik%2F%2BSfY9obLEhbcmt54V27Ao9lb05k3jk%3D&reserved= =3D0 > >< > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa815= 47953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data= =3D04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4= f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637619570850842907%7CUnknown%7CTWFpb= GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7= C3000&sdata=3DIYMIxCLKxoONu7qa9IQViRyjj%2FaOVY8x%2FuHlI2oMfXQ%3D&reserved= =3D0 > < > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa815= 47953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data= =3D04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4= f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257412739%7CUnknown%7CTWFpb= GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7= C1000&sdata=3DyEe8Bn%2BylkLhfB9skaPVxsxI0ngxAQKWjX6C04jlfWw%3D&reserved=3D0 > >>>. > > -- > You received this message because you are subscribed to the Google Groups= =20 > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= =20 > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/24273fbf-2ce9-4c26-886b-= 50d504cb7b05n%40googlegroups.com > < > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F24273fbf-2ce9-4c26-886b-50d504c= b7b05n%2540googlegroups.com&data=3D04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa1= 4005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C6376= 98008257412739%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi= LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DQuoRTqPLxb%2FS0L5JVpkdfc2Qg14= lS%2FKS7Yvtzwm4cfg%3D&reserved=3D0 > >< > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F24273fbf-2ce9-4c26-886b-50d504c= b7b05n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data= =3D04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4= f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637619570850852858%7CUnknown%7CTWFpb= GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7= C3000&sdata=3D%2BFQ5SkpbfzgZ7yLWGIi9uTHBuMaN9nBZzA%2Ffzwt4XnU%3D&reserved= =3D0 > < > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F24273fbf-2ce9-4c26-886b-50d504c= b7b05n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data= =3D04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4= f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257422689%7CUnknown%7CTWFpb= GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7= C1000&sdata=3DeY8ikgAGkbiXevRpp4lXXfUT71j0tATVP%2BQuJfd%2BGE8%3D&reserved= =3D0 > >>. > > -- > You received this message because you are subscribed to the Google Groups= =20 > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= =20 > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/50bcbdc6-8d4b-49c1-badb-= f35fb968112dn%40googlegroups.com > < > https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroup= s.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F50bcbdc6-8d4b-49c1-badb-f35fb96= 8112dn%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data= =3D04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4= f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257432648%7CUnknown%7CTWFpb= GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7= C1000&sdata=3DVihSAzBEbJslRknO%2FoJs32DAyA39iUnoN2alFLB2mfU%3D&reserved=3D0 > >. > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/d537ca16-e31c-4b90-ab6d-6a5e80200618n%40googlegroups.com. ------=_Part_7765_1820900886.1634540864054 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable No worries. Thanks for taking a look. I responded to your question in the p= ull request.

On Sunday, October 17, 2021 at 1:02:22 PM UTC+2 Jesse Rosenthal = wrote:
Dear M= ilan,

Just commented on github. This looks good to me. I apologize for the lo= ng wait here, and for taking so long to turn my attention to this.

Thanks for making this work, and for sharing it with everyone else. Sor= ry to stand in the way of that process being a bit smoother.

Best,
Jesse

________________________________________
From: pandoc-...@googlegroup= s.com <pandoc-...@googleg= roups.com> on behalf of Milan Bracke <milan....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Thursday, October 14, 2021 5:33 AM
To: pandoc-discuss
Subject: Re: docx parsing bug: nested fldChar fields are interpreted in= correctly

Hi Jesse,

I hope you had a good summer. Do you have time to look at my pull reque= st in the coming weeks?
I'm now using a fork of Pandoc to have this fix and I have to rebas= e every time something useful is done
in the main repo, so I would really like to have this fix merged.

Best,
Milan

On Thursday, July 15, 2021 at 4:47:19 PM UTC+2 Jesse Rosenthal wrote:
Hi Milan,

Thanks for the heads up. Honestly just summer craziness: visiting famil= y for the first time in almost two years, shuttling the kids around. Life s= tuff. I'll take a look at it ASAP.

Best,
Jesse

________________________________________
From: pandoc-...@googlegroup= s.com <pandoc-...@googleg= roups.com> on behalf of Milan Bracke <milan....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Thursday, July 15, 2021 10:43 AM
To: pandoc-discuss
Subject: Re: docx parsing bug: nested fldChar fields are interpreted in= correctly

Hi all,

I've had this pull request open for more than 3 weeks now: https://github.com/jg= m/pandoc/pull/7401<h= ttps://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgithub.c= om%2Fjgm%2Fpandoc%2Fpull%2F7401&data=3D04%7C01%7Cjrosenthal%40jhu.edu%7= C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C= 0%7C637698008257392816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi= V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DquwNSpn8xQSGtRt%2= BHuTTBHLOLUltz%2FZPGhmKHQ2hBL8%3D&reserved=3D0>
Is there a reason it's not getting any reaction? I'd be happy t= o improve or explain it. If I've done something
wrong, I'd like to know, so I can fix it.

Best,
Milan

On Thursday, June 17, 2021 at 8:42:48 AM UTC+2 Milan Bracke wrote:
Hi Jesse,

Thanks for the feedback. I'll ping you when making the PR. Most of = my code seems to work so far, but I still
have some trouble with the fact that the fields now need to contain Par= Parts instead of Runs. It's harder to
match all the cases and treat them correctly. I'll try some more an= d let you know how it goes.

Best,
Milan
On Wednesday, June 16, 2021 at 4:21:05 PM UTC+2 Jesse Rosenthal wrote:
Hi Milan,

I wrote the original fldChar code (and that comment) and I figured it w= ould have to evolve as further requirements became necessary. If nesting is= a requirement, a stack instead of a toggle seems appropriate.

As far as crossing paragraphs goes -- your approach seems right (and si= milar to how we've dealt with similar issues like comments crossing par= agraphs in docx parsing).

I'd be happy to take a look and offer comments/feedback on your cod= e. Just make sure to ping me (@jkr) on your PRs.

Best,
Jesse

________________________________________
From: pandoc-...@googlegroup= s.com <pandoc-...@googleg= roups.com> on behalf of Milan Bracke <milan....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Wednesday, June 16, 2021 5:33 AM
To: pandoc-discuss
Subject: Re: docx parsing bug: nested fldChar fields are interpreted in= correctly

I can't fix this without at least some feedback. It's a complex= issue and the fix will take some time, so I need to at least know that my = proposed solution
seems good and would be accepted if implemented correctly.

On Tuesday, June 15, 2021 at 8:38:30 AM UTC+2 Milan Bracke wrote:
I've encountered a new problem. A fldChar field can span multiple p= aragraphs, but it doesn't have to start at the beginning of the first o= ne.
Because of this, a field across multiple paragraphs will merge those pa= ragraphs.
There is no way to represent this exactly in the pandoc model I think. = So my current solution is to have different fields with the same field
info in the different paragraphs. This can at least make the hyperlink = fields work and I think it will work for the other fields we might add in
the future as well (I've checked the list).
What do you think about this ?

On Monday, June 14, 2021 at 9:17:13 AM UTC+2 Milan Bracke wrote:
For those who don't know fldChar fields, this comment from the docx= parse code (parse.hs, starting on line 825) explains it:

fldChar fields work by first
having a <w:fldChar fldCharType=3D"begin"> in a run, th= en a run with
<w:instrText>, then a <w:fldChar fldCharType=3D"separate&= quot;> run, then the
content runs, and finally a <w:fldChar fldCharType=3D"end"= > run. For
example (omissions and my comments in brackets):

<w:r>
[...]
<w:fldChar w:fldCharType=3D"begin"/>
</w:r>
<w:r>
[...]
<w:instrText xml:space=3D"preserve"> HYPERLINK [hyperli= nk url] </w:instrText>
</w:r>
<w:r>
[...]
<w:fldChar w:fldCharType=3D"separate"/>
</w:r>
<w:r w:rsidRPr=3D[...]>
[...]
<w:t>Foundations of Analysis, 2nd Edition</w:t>
</w:r>
<w:r>
[...]
<w:fldChar w:fldCharType=3D"end"/>
</w:r>

The current way of parsing fldChar fields doesn't take into account= that they can be nested. So the end of the nested flcChar field will be in= terpreted as the end of the surrounding one. This could for example lead to= a hyperlink that ends too soon. See attached example for a docx that demon= strates this.

I propose to fix this by turning the fldChar state into a stack, so tha= t a field can be started and ended inside other fields. I will include this= in my pull request for PAGEREF fields that I announced here a while ago, s= ince they are related.

--
You received this message because you are subscribed to the Google Grou= ps "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send = an email to pandoc-discus...@goo= glegroups.com<mailto:pand= oc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9bdb337d-fa= 68-4c66-8f5c-d4fa81547953n%40googlegroups.com<https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2= Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d= 4fa81547953n%2540googlegroups.com&data=3D04%7C01%7Cjrosenthal%40jhu.edu= %7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%= 7C0%7C637698008257392816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj= oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DLUBegYzlL9%2BxT= t7flRmGawdKytypKU9cbguRYa4L7GY%3D&reserved=3D0><https://nam02.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c= 66-8f5c-d4fa81547953n%2540googlegroups.com&data=3D04%7C01%7Cjrosenthal%= 40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4f438b1e6473b803f86f8aedf= 0dec%7C0%7C0%7C637619570850832949%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM= DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3DyihkvC= %2B7Le7l00nWwyXnOyOmASIibuFvMDgLIXStUSc%3D&reserved=3D0<https://nam02.safelinks.protection.outlook.co= m/?url=3Dhttps%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bd= b337d-fa68-4c66-8f5c-d4fa81547953n%2540googlegroups.com&data=3D04%7C01%= 7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473= b803f86f8aedf0dec%7C0%7C0%7C637698008257402781%7CUnknown%7CTWFpbGZsb3d8eyJW= IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&s= data=3D5AwXcUxCA%2BCpIEMk0K%2FJwmSWZlDH8dK4UQebZ7jMQKo%3D&reserved=3D0<= /a>>><https://n= am02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroups.google.co= m%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%2540= googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=3D04%7= C01%7Cjrosenthal%40jhu.edu%7C3013bb2b353d4b73a4dd08d930a9dbd8%7C9fa4f438b1e= 6473b803f86f8aedf0dec%7C0%7C0%7C637594329240701072%7CUnknown%7CTWFpbGZsb3d8= eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&a= mp;sdata=3D8fxpTInSSkpzMwmvDK0BYRHtKx%2BArUEcX7BLQoBE7qo%3D&reserved=3D= 0<https= ://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroups.googl= e.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%= 2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=3D= 04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f43= 8b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257402781%7CUnknown%7CTWFpbGZs= b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10= 00&sdata=3DfI%2BAchsEj1Ik%2F%2BSfY9obLEhbcmt54V27Ao9lb05k3jk%3D&res= erved=3D0><https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroups= .google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa8154= 7953n%2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&d= ata=3D04%7C01%7Cjrosenthal%40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9= fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637619570850842907%7CUnknown%7CTW= FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3= D%7C3000&sdata=3DIYMIxCLKxoONu7qa9IQViRyjj%2FaOVY8x%2FuHlI2oMfXQ%3D&= ;reserved=3D0<https= ://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgroups.googl= e.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F9bdb337d-fa68-4c66-8f5c-d4fa81547953n%= 2540googlegroups.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=3D= 04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f43= 8b1e6473b803f86f8aedf0dec%7C0%7C0%7C637698008257412739%7CUnknown%7CTWFpbGZs= b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10= 00&sdata=3DyEe8Bn%2BylkLhfB9skaPVxsxI0ngxAQKWjX6C04jlfWw%3D&reserve= d=3D0>>>.

--
You received this message because you are subscribed to the Google Grou= ps "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send = an email to pandoc-discus...@goo= glegroups.com<mailto:pand= oc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/24273fbf-2c= e9-4c26-886b-50d504cb7b05n%40googlegroups.com<https://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3= A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F24273fbf-2ce9-4c26-= 886b-50d504cb7b05n%2540googlegroups.com&data=3D04%7C01%7Cjrosenthal%40j= hu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0de= c%7C0%7C0%7C637698008257412739%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi= LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DQuoRTqPLx= b%2FS0L5JVpkdfc2Qg14lS%2FKS7Yvtzwm4cfg%3D&reserved=3D0><https://nam02.safelinks.p= rotection.outlook.com/?url=3Dhttps%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2F= pandoc-discuss%2F24273fbf-2ce9-4c26-886b-50d504cb7b05n%2540googlegroups.com= %3Futm_medium%3Demail%26utm_source%3Dfooter&data=3D04%7C01%7Cjrosenthal= %40jhu.edu%7Ca0b9fef0818d4fa2010808d9479efa08%7C9fa4f438b1e6473b803f86f8aed= f0dec%7C0%7C0%7C637619570850852858%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw= MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3D%2BFQ= 5SkpbfzgZ7yLWGIi9uTHBuMaN9nBZzA%2Ffzwt4XnU%3D&reserved=3D0<https://nam02.safelinks.p= rotection.outlook.com/?url=3Dhttps%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2F= pandoc-discuss%2F24273fbf-2ce9-4c26-886b-50d504cb7b05n%2540googlegroups.com= %3Futm_medium%3Demail%26utm_source%3Dfooter&data=3D04%7C01%7Cjrosenthal= %40jhu.edu%7C48a0aa14005e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aed= f0dec%7C0%7C0%7C637698008257422689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw= MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DeY8ik= gAGkbiXevRpp4lXXfUT71j0tATVP%2BQuJfd%2BGE8%3D&reserved=3D0>>.

--
You received this message because you are subscribed to the Google Grou= ps "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send = an email to pandoc-discus...@goo= glegroups.com<mailto:pand= oc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/50bcbdc6-8d= 4b-49c1-badb-f35fb968112dn%40googlegroups.com<https://nam02.safelinks.protection.outlook.com/?= url=3Dhttps%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fpandoc-discuss%2F50bcbd= c6-8d4b-49c1-badb-f35fb968112dn%2540googlegroups.com%3Futm_medium%3Demail%2= 6utm_source%3Dfooter&data=3D04%7C01%7Cjrosenthal%40jhu.edu%7C48a0aa1400= 5e4819a4c008d98ef5b5f0%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C6376980= 08257432648%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJ= BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DVihSAzBEbJslRknO%2FoJs32DAyA= 39iUnoN2alFLB2mfU%3D&reserved=3D0>.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/d537ca16-e31c-4b90-ab6d-6a5e80200618n%40googlegroups.= com.
------=_Part_7765_1820900886.1634540864054-- ------=_Part_7764_141485553.1634540864054--