From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/28782 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Gary Glass Newsgroups: gmane.text.pandoc Subject: Re: File splitting bug Date: Thu, 8 Jul 2021 22:56:05 -0700 (PDT) Message-ID: <53332f68-2c48-4416-91b8-8e34395d0859n@googlegroups.com> References: <297bc662-7841-4423-bcbb-534e99bbba09n@googlegroups.com> <38ac5d4c-8cba-4c23-a313-bf81e79779e7n@googlegroups.com> <5d588eaf-0fd8-4023-8296-b9748189593cn@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_310_1620057043.1625810165040" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40541"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCO3BB5GY4ARB5WJT6DQMGQEK6FDXJQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Jul 09 07:56:08 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oi1-f187.google.com ([209.85.167.187]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1m1jUe-000ALP-CE for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 09 Jul 2021 07:56:08 +0200 Original-Received: by mail-oi1-f187.google.com with SMTP id m13-20020a056808024db029023e63851e0dsf6037339oie.8 for ; Thu, 08 Jul 2021 22:56:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=mcahWbXuIINfJ5s4UP0vLMt86diKD5IlBM0e68bTWg8=; b=ei0abSuN8d9WzuHIIJq23pRhQr0+Fndy4hpe1fnvFN7C3hMLBR9l6Osrx7rwm9ZwCa uSXqn1BGze7CgB7LZ/ua7X4g5yxdPn3h5IJU19l5LGdo+lrfDgBgIWHJkisDtMjkELPf UaTXVJ9bG8vI7KF6y0+cEk81XDebjZ17VxKRsNaJOGC+U12AL+CjERGthEB9zSVBVAVZ 1pqm+3fgH1r7pGWRElYwnKAH8IWERckKf/QoqO988+gdW1ukJfuzw1rwk+J3K4GIF9BK 8prWhWyF6Apxj0C88nHW2RVO0CVKjDttb7g328wzVaYdz1ywB8oWC0eBHOAnX/vhh9WG Vh3g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=mcahWbXuIINfJ5s4UP0vLMt86diKD5IlBM0e68bTWg8=; b=iTcyIOMsVdcUz5vXyg224NiAWrrXQBXrjxAq4VDjUILcK9bS2yye+QOHe9dd8PdWO5 3BG/01Ph3NSU5z20mC9v7Ut2eeKtODIxLXK1JGVTBxm/XsiDM79UMcwDyFgiYdIqWrFe Yxl6zeQuvOvyBr53k49jhnhRfClFbeaVvtzcG4GZx4s9+YvjuaAaERT65VUDjMcFS4Nv WEhMnrDFswULNCqDi7d8Y5QADLXFloqJHiYpVI29URspW5Xu6DJI2lsQ4/zRk2Y72rM2 cpTywClo5IZHOEQUUrUIjVYpx+xQfeCCcm88J7q5YVJl5WF3Tf8Iuj/2yZVSnQi1wKcI W3Xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=mcahWbXuIINfJ5s4UP0vLMt86diKD5IlBM0e68bTWg8=; b=LAPyvEakXR2tVEXEIxMYE4gKRJ5HjbYfYsQeXyoFwMgjMh/PI9nin18q1lSMiVm4tT 9Fu5VtIJoBmVtY29ab8DrnkHsppeC2ZkHm91paaJ3Fqcs04fyS7qZKXNvX/TyF0k+ET8 eEouG1OICUTKbZWpOgJ9M9Rl2lBkUHBTM8crj18g5fJvmgnGon8BopoJaCz0JmL870pN PZBL5ayAQtBa3y1NgATYGiB6EzQtlaP04SqosxiY8cY/Y05NDGHjiMpB6di4X/2CeLPx JxKwSgClOA3rHNl9CXhBbNrRNr+eOps9dvIrSOtc2maAx22AZfNTx0LJTXpwhJwRYHKJ 1crQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM531zSD/XL8xUXkwDDzf0L/lLZ6azBk6syy+jqM/Y/0ZrRGcNlQni f/J3wu0855npMQAwIne/OOU= X-Google-Smtp-Source: ABdhPJy/DW+s55SPQkJVWtGQXIFn022eC9WQezOAco+9pNNhpBnIjSMEaUiFiLpTzvK85+4+VSGUVQ== X-Received: by 2002:a05:6808:1144:: with SMTP id u4mr6729965oiu.133.1625810167383; Thu, 08 Jul 2021 22:56:07 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a4a:1842:: with SMTP id 63ls382562ooo.11.gmail; Thu, 08 Jul 2021 22:56:06 -0700 (PDT) X-Received: by 2002:a4a:b203:: with SMTP id d3mr25152499ooo.55.1625810165803; Thu, 08 Jul 2021 22:56:05 -0700 (PDT) In-Reply-To: X-Original-Sender: garyglassphotography-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:28782 Archived-At: ------=_Part_310_1620057043.1625810165040 Content-Type: multipart/alternative; boundary="----=_Part_311_1328582213.1625810165040" ------=_Part_311_1328582213.1625810165040 Content-Type: text/plain; charset="UTF-8" Well for some reason that doesn't work. The pandoc.exe just hangs when I run it. On Wednesday, July 7, 2021 at 6:48:04 PM UTC+2 John MacFarlane wrote: > > You can try a nightly. > https://github.com/jgm/pandoc/actions/runs/1007239404 > > Gary Glass writes: > > > Is there an installer for that rev? I'll be happy to test it. > > > > On Tuesday, July 6, 2021 at 7:26:26 PM UTC+2 John MacFarlane wrote: > > > >> > >> OK, I think I've fixed this in > >> commit f88ebf3ebf49e00ffa12778caf6817cc34459e6a > >> > >> John MacFarlane writes: > >> > >> > Another big clue: if you remove the elements > >> > from the , it works again. It also works if you use > >> > > >> > > >> > > >> > instead of > >> > > >> > > >> > > >> > > >> > > >> > John MacFarlane writes: > >> > > >> >> Thank you for the minimal test case! > >> >> Actually one can see the issue just with > >> >> > >> >> pandoc --section-divs bug.md > >> >> > >> >> At the end there is > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> where you'd want > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> The difference is that, with the colgroup, the
tags are > >> >> being parsed as raw HTML blocks, while without it, we get a > >> >> native Div in the AST (which is what we want in this case). > >> >> > >> >> Somehow the colgroup is interfering with parsing of the native > >> >> Div. > >> >> > >> >> If you don't mind reporting this at > >> >> https://github.com/jgm/pandoc/issues (including this information) > >> >> it will help us keep track. Looking at the code, I currently > >> >> have no idea why this is happening. > >> >> > >> >> Gary Glass writes: > >> >> > >> >>> Here's the simplest file I could make to repro the issue. The > pandoc > >> >>> command is very simple: > >> >>> > >> >>> pandoc --output=bug.epub --to=epub3 bug.md > >> >>> > >> >>> It produces an HTML file with a mismatched section tag. > >> >>> > >> >>> If you comment out the colgroup, the output is fine. > >> >>> > >> >>> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote: > >> >>> > >> >>>> > >> >>>> Pandoc won't emit invalid HTML itself, but if you include > >> >>>> invalid HTML, it just dutifully passes it through verbatim. > >> >>>> > >> >>>> Checking HTML syntax is not pandoc's job. Use epubcheck > >> >>>> to verify the EPUB if you like. > >> >>>> > >> >>>> Gary Glass writes: > >> >>>> > >> >>>> > I figured out the source of the issue. I had an html table in > the > >> >>>> markdown > >> >>>> > and I added a colgroup to the table. The colgroup caused the > >> problem. > >> >>>> > Removing it made it go away. > >> >>>> > > >> >>>> > Colgroup is not a commonly used tag (in my experience), but I > think > >> the > >> >>>> bug > >> >>>> > is that pandoc shouldn't just emit invalid epub html when the > >> source > >> >>>> code > >> >>>> > is valid, even if it doesn't know what to do with it. Report an > >> error or > >> >>>> > something! The html looked something like this: > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > ... > >> >>>> >
.........
> >> >>>> > > >> >>>> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane > wrote: > >> >>>> > > >> >>>> >> > >> >>>> >> No ideas. We'd have to see the actual files to know more. > >> >>>> >> > >> >>>> >> > >> >>>> > > >> >>>> > -- > >> >>>> > You received this message because you are subscribed to the > Google > >> >>>> Groups "pandoc-discuss" group. > >> >>>> > To unsubscribe from this group and stop receiving emails from > it, > >> send > >> >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > >> >>>> > To view this discussion on the web visit > >> >>>> > >> > https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com > >> >>>> . > >> >>>> > >> >>> > >> >>> -- > >> >>> You received this message because you are subscribed to the Google > >> Groups "pandoc-discuss" group. > >> >>> To unsubscribe from this group and stop receiving emails from it, > send > >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > >> >>> To view this discussion on the web visit > >> > https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com > >> . > >> >>> # header 1 > >> >>> > >> >>>
> >> >>> > >> >>> ## header 2 > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>>
abc
xxxxxxxxx
> >> >>> > >> >>>
> >> > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/5d588eaf-0fd8-4023-8296-b9748189593cn%40googlegroups.com > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/53332f68-2c48-4416-91b8-8e34395d0859n%40googlegroups.com. ------=_Part_311_1328582213.1625810165040 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Well for some reason that doesn't work. The pandoc.exe just hangs when I ru= n it.

On Wednesday, July 7, 2021 at 6:48:04 PM UTC+2 John MacFarlane wrote:

You can try a nightly.
https://github.com/jgm/pandoc/actions/runs/1007239404

Gary Glass <garyglassp...= @gmail.com> writes:

> Is there an installer for that rev? I'll be happy to test it.
>
> On Tuesday, July 6, 2021 at 7:26:26 PM UTC+2 John MacFarlane wrote= :
>
>>
>> OK, I think I've fixed this in
>> commit f88ebf3ebf49e00ffa12778caf6817cc34459e6a
>>
>> John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>>
>> > Another big clue: if you remove the <col/> elements
>> > from the <colgroup>, it works again. It also works = if you use
>> >
>> > <col></col>
>> >
>> > instead of
>> >
>> > <col />
>> >
>> >
>> >
>> > John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>> >
>> >> Thank you for the minimal test case!
>> >> Actually one can see the issue just with
>> >>
>> >> pandoc --section-divs bug.md
>> >>
>> >> At the end there is
>> >>
>> >> </div>
>> >> </section>
>> >> </section>
>> >>
>> >> where you'd want
>> >>
>> >> </section>
>> >> </div>
>> >> </section>
>> >>
>> >> The difference is that, with the colgroup, the <di= v> tags are
>> >> being parsed as raw HTML blocks, while without it, we= get a
>> >> native Div in the AST (which is what we want in this = case).
>> >>
>> >> Somehow the colgroup is interfering with parsing of t= he native
>> >> Div.
>> >>
>> >> If you don't mind reporting this at
>> >> https://github.com/jgm/pandoc/issues (including this information)
>> >> it will help us keep track. Looking at the code, I cu= rrently
>> >> have no idea why this is happening.
>> >>
>> >> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >>
>> >>> Here's the simplest file I could make to repr= o the issue. The pandoc=20
>> >>> command is very simple:
>> >>>
>> >>> pandoc --output=3Dbug.epub --to=3Depub3 bug.md
>> >>>
>> >>> It produces an HTML file with a mismatched sectio= n tag.
>> >>>
>> >>> If you comment out the colgroup, the output is fi= ne.
>> >>>
>> >>> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John = MacFarlane wrote:
>> >>>
>> >>>>
>> >>>> Pandoc won't emit invalid HTML itself, bu= t if you include
>> >>>> invalid HTML, it just dutifully passes it thr= ough verbatim.
>> >>>>
>> >>>> Checking HTML syntax is not pandoc's job.= Use epubcheck
>> >>>> to verify the EPUB if you like.
>> >>>>
>> >>>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >>>>
>> >>>> > I figured out the source of the issue. I= had an html table in the=20
>> >>>> markdown=20
>> >>>> > and I added a colgroup to the table. The= colgroup caused the=20
>> problem.=20
>> >>>> > Removing it made it go away.
>> >>>> >
>> >>>> > Colgroup is not a commonly used tag (in = my experience), but I think=20
>> the=20
>> >>>> bug=20
>> >>>> > is that pandoc shouldn't just emit i= nvalid epub html when the=20
>> source=20
>> >>>> code=20
>> >>>> > is valid, even if it doesn't know wh= at to do with it. Report an=20
>> error or=20
>> >>>> > something! The html looked something lik= e this:
>> >>>> >
>> >>>> > <table>
>> >>>> > <colgroup>
>> >>>> > <col />
>> >>>> > <col style=3D"width: 33%;" = />
>> >>>> > <col />
>> >>>> > </colgroup>
>> >>>> > <tr>
>> >>>> > <td>...</td>
>> >>>> > <td>...</td>
>> >>>> > <td>...</td>
>> >>>> > </tr>
>> >>>> > ...
>> >>>> > </table>
>> >>>> >
>> >>>> > On Thursday, July 1, 2021 at 5:57:57 PM = UTC+2 John MacFarlane wrote:
>> >>>> >
>> >>>> >>
>> >>>> >> No ideas. We'd have to see the a= ctual files to know more.
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>> > --=20
>> >>>> > You received this message because you ar= e subscribed to the Google=20
>> >>>> Groups "pandoc-discuss" group.
>> >>>> > To unsubscribe from this group and stop = receiving emails from it,=20
>> send=20
>> >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
>> >>>> > To view this discussion on the web visit= =20
>> >>>>=20
>> https://groups.go= ogle.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40goo= glegroups.com
>> >>>> .
>> >>>>
>> >>>
>> >>> --=20
>> >>> You received this message because you are subscri= bed to the Google=20
>> Groups "pandoc-discuss" group.
>> >>> To unsubscribe from this group and stop receiving= emails from it, send=20
>> an email to pandoc-= discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
>> >>> To view this discussion on the web visit=20
>> https://groups.go= ogle.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40goo= glegroups.com
>> .
>> >>> # header 1
>> >>>
>> >>> <div>
>> >>>
>> >>> ## header 2
>> >>>
>> >>> <table>
>> >>> <colgroup>
>> >>> <col />
>> >>> <col />
>> >>> <col />
>> >>> </colgroup>
>> >>> <thead>
>> >>> <tr>
>> >>> <th>a</th>
>> >>> <th>b</th>
>> >>> <th>c</th>
>> >>> </tr>
>> >>> </thead>
>> >>> <tbody>
>> >>> <tr>
>> >>> <td>xxx</td>
>> >>> <td>xxx</td>
>> >>> <td>xxx</td>
>> >>> </tr>
>> >>> </tbody>
>> >>> </table>
>> >>>
>> >>> </div>
>>
>
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-discus..= .@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5d58= 8eaf-0fd8-4023-8296-b9748189593cn%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/53332f68-2c48-4416-91b8-8e34395d0859n%40googlegroups.= com.
------=_Part_311_1328582213.1625810165040-- ------=_Part_310_1620057043.1625810165040--