From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/28763 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: File splitting bug Date: Tue, 06 Jul 2021 09:19:23 -0700 Message-ID: References: <297bc662-7841-4423-bcbb-534e99bbba09n@googlegroups.com> <38ac5d4c-8cba-4c23-a313-bf81e79779e7n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27822"; mail-complaints-to="usenet@ciao.gmane.io" To: Gary Glass , pandoc-discuss Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBGMFSKDQMGQE6YGY66Q-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Jul 06 18:19:40 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-vs1-f57.google.com ([209.85.217.57]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1m0nnP-00071Z-QJ for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 06 Jul 2021 18:19:39 +0200 Original-Received: by mail-vs1-f57.google.com with SMTP id y129-20020a677d870000b029026b5893c4aasf5005556vsc.4 for ; Tue, 06 Jul 2021 09:19:39 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1625588378; cv=pass; d=google.com; s=arc-20160816; b=LYFKZuX0x9JvmtmD/LPUPb477iur2Wvc8Zi0EE62ztEr6IHytEucGd4toyhss8HnoD F9nnAff0k73bv85Hab/BS1lMNiXONoxd6WBTc1Wv3UYo5bB2TAQZoo4T9OuNBCCaUlhF KyK7D7j6pQlbsIVUDxAOyHxejAhuxJV8Y3HyqrVDtdLB1AUDZW07447PluM4757lE0/D AZc4JEUc556DIfLDcmjU7b6ADlj0ndFXHHevCrTf24lZEBBUfg7vj/i+87PT9BhudD5i a2bQpd+r3vQ5m7uZh++Hs1ZHffxfE5tVAeivtOyejFfuXfkjd6Ckw4IMdz3Ed98qPJtv mYgg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :date:references:in-reply-to:subject:to:from:sender:dkim-signature; bh=ax7dozUCcR35cb7W5/doAOXCltYWaSdf8tprBWTjbes=; b=icDW6lor2Ar01J38DaRZeedKJcf4HZ8x0n8JOOSwI02EfW3JBg7erqXS/+oGCfgHYH Fgaf46lLO1ey0pmzQfdE5QP/Xpm0wyfCthUX0PhEkVm81DZtaKQYB4aIChp0jgyiTr8L YC11mVaBQTyZGJRcbbacsso4cWWm4ca0gham0JBDqK8sraGlQ+4ruQfs3vrIELRWsUTv rqKp6iIgoOFQZT/DCZN1EZvwwWUImxzYUwdbJRFqqki2t7iZ0nzV7pVFeLM91HejnUHq 6qistqrWKqnHp3GWw/SvnR9urlDlYCsIgduIFjPo7av9ErLse0NQIq86fxW+qq1L/V5W NUag== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=trWWStyg; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::42e as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:subject:in-reply-to:references:date:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=ax7dozUCcR35cb7W5/doAOXCltYWaSdf8tprBWTjbes=; b=ZwBcReHVFZvZownEFFxswbPU8sz4RvdEV4BGg8rIXfMSupqQhT9oNyf351nwWqbZ7A JRweLNqv0RuIXapFjuoFm5HDaRWf52joqXwbFdWHdi+5ZZY9RkAgwe+pU/vXeTPfhCD1 I8pViMMml3D+xpNg4UDc0GPDXkgdERf8Uoga4HnEZJMiDwuJZZ8XBg8UdruiRkAbU+cv XK87PDq33ui6jTvJo10Mi3gLyB1mX7ftQm2Dp897mGvQH7E0DnU2pmQSLW+k3PrsIrOi 1pEhro7tbnXo4obFcEfYQCCcYzxhSyNksNqiPoZdqEsfFJ7b/5hRgsP08IH1YJI2XDbZ f4Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:subject:in-reply-to:references :date:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=ax7dozUCcR35cb7W5/doAOXCltYWaSdf8tprBWTjbes=; b=oybIWYGxdXvPbBTU70H9X3ZnNCuI//ecH3lnDYo83tn2m/E/FxC1VJ9KHUuKfMAkan 6O+AmrajknOpVZfaglSYUSLCfzjGFScInLmYQ4/Gy5z3dIempfHqlUtseEgCwJQbKj3o tpxoiAdPZkOGzr5M3Bgo7kmapxNen2ENCqCUAH3oCYiTlvVuH0eAfHiC7Zz5yukJwHL/ 5Mx5MAqD5nCV81z5JBJe4EIIP+mQ88ps8oIqL4JyyALEY59VsIoXhqrAoWoiyQtF5HdI 0elwqZsEW6zrPoRmVJb8dNRCIIWo6MsuVuXnwgZGSeaupBTnJryxWstR1Ht1auxEgtfI DyIg== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM532kWMcu+89lhrQkXA2d8k8SdCZTinhjWh+JF6eOSWGbpzIu2gg6 U8HfeYn6UrGkLpuqzg4IMyE= X-Google-Smtp-Source: ABdhPJz/Ekc6IkGA9ypiWv2OV+5TGlKKoB+1AgtAKKJkuKQ8PIayIvISYkvn5yWI4u2IGOvySzK/NQ== X-Received: by 2002:ab0:29d9:: with SMTP id i25mr17268199uaq.70.1625588378556; Tue, 06 Jul 2021 09:19:38 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a67:6a86:: with SMTP id f128ls2638485vsc.3.gmail; Tue, 06 Jul 2021 09:19:36 -0700 (PDT) X-Received: by 2002:a05:6102:949:: with SMTP id a9mr15707793vsi.54.1625588376652; Tue, 06 Jul 2021 09:19:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625588376; cv=none; d=google.com; s=arc-20160816; b=OGm93Voh1jYYgxoNu8cpIeJo+m3UzGhfSksWZ9RmCu+NxxR8MTxeae0g+KmZPO5Tlb c8iMN2+o7CWJQ1DBvAA1KYjNhNepBx7vlmNTligYQ5PXZrccadS7KUw7NJ4zAFtec1nK QOPWh7g/OD9e5P7+QoYdKT7iw7h8jqhVcS305oZ0TmzS9OOYSLzb7R53ZVOm9wNvHlVO xlmuixzmkKP4G8WhDn+OTNjMagKwU1hltt3GMpIauNBrdVFBn3Rzaac0WXrTjCT/eUCd fNmqPJ1jXsQ/xrAipLHD9lSU07wSKPmpK0mhOSSNJPquZsV/sWyuBmYp7NUNxbRWrFbp sakw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :dkim-signature; bh=qDc+ocS2L1iZf+FFpzC/NLPlFUDe8z7NOvPdsYjmpDs=; b=kUzPNUV24HSsoXnjiWI1+iyQ4nbX8i/AjN4zfSDQSbaJiiF+Q6EOkLm1BiPYSY6GNq d5tlTcbfqtejFDVHvZqqn3vEtpLKVNXH7LZEmHscpN7gdWTIu/7Sn2cOCIzg3r+cAP+G ix6pKqzUV7fi43hxYrMu7zj02PFKp1LQnp/Pu0irk2LS1U9BwnzQfFtcHpRzm3n6P0WF wtD3xw2YysVtaPCE6gMUXWgdCqUbtqWkhPeW1zSai/h3pANxAiQDHuoBrsg1utBqDOOl hz6SBxrvYjxFrV5H00eqW1StJ6QY3GFmHMhY9JAKf2z09tWfA7gHBq8YpcH7d7Uy/94W fgHg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=trWWStyg; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::42e as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com. [2607:f8b0:4864:20::42e]) by gmr-mx.google.com with ESMTPS id y18si1182034ual.1.2021.07.06.09.19.36 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 06 Jul 2021 09:19:36 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::42e as permitted sender) client-ip=2607:f8b0:4864:20::42e; Original-Received: by mail-pf1-x42e.google.com with SMTP id b12so18815607pfv.6 for ; Tue, 06 Jul 2021 09:19:36 -0700 (PDT) X-Received: by 2002:a63:190b:: with SMTP id z11mr8410030pgl.320.1625588375451; Tue, 06 Jul 2021 09:19:35 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id y3sm19954075pga.72.2021.07.06.09.19.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jul 2021 09:19:34 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id A977EA249; Tue, 6 Jul 2021 12:19:23 -0400 (EDT) In-Reply-To: X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=trWWStyg; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::42e as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:28763 Archived-At: Thank you for the minimal test case! Actually one can see the issue just with pandoc --section-divs bug.md At the end there is where you'd want The difference is that, with the colgroup, the
tags are being parsed as raw HTML blocks, while without it, we get a native Div in the AST (which is what we want in this case). Somehow the colgroup is interfering with parsing of the native Div. If you don't mind reporting this at https://github.com/jgm/pandoc/issues (including this information) it will help us keep track. Looking at the code, I currently have no idea why this is happening. Gary Glass writes: > Here's the simplest file I could make to repro the issue. The pandoc > command is very simple: > > pandoc --output=bug.epub --to=epub3 bug.md > > It produces an HTML file with a mismatched section tag. > > If you comment out the colgroup, the output is fine. > > On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote: > >> >> Pandoc won't emit invalid HTML itself, but if you include >> invalid HTML, it just dutifully passes it through verbatim. >> >> Checking HTML syntax is not pandoc's job. Use epubcheck >> to verify the EPUB if you like. >> >> Gary Glass writes: >> >> > I figured out the source of the issue. I had an html table in the >> markdown >> > and I added a colgroup to the table. The colgroup caused the problem. >> > Removing it made it go away. >> > >> > Colgroup is not a commonly used tag (in my experience), but I think the >> bug >> > is that pandoc shouldn't just emit invalid epub html when the source >> code >> > is valid, even if it doesn't know what to do with it. Report an error or >> > something! The html looked something like this: >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > ... >> >
.........
>> > >> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane wrote: >> > >> >> >> >> No ideas. We'd have to see the actual files to know more. >> >> >> >> >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com >> . >> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com. > # header 1 > >
> > ## header 2 > > > > > > > > > > > > > > > > > > > > > >
abc
xxxxxxxxx
> >