public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* File splitting bug
@ 2021-07-01  6:46 Gary Glass
       [not found] ` <297bc662-7841-4423-bcbb-534e99bbba09n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Gary Glass @ 2021-07-01  6:46 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1314 bytes --]

I have a strange problem / bug. 

pandoc version 2.13

I'm converting markdown to epub. I have 2 markdown files with this general 
format:

```markdown

## main header 

<div class="file-specific-class">

### subheader 1

Some text.

### subheader 2

Some text.

</div>

```

My pandoc settings specify: epub-chapter-level: 3

Here's the bug: For one of these files the entire file is converted to one 
html file in the epub. For the other one, each h3 section is converted to a 
separate html file, and therefore the div that encapsulates the whole file 
is broken: the opening tag is in the first file (with the h2 tag) and the 
closing tag is in the last h3 section file.

The weirder thing is that this only started yesterday even though I haven't 
made any structural changes to either file and I haven't upgraded pandoc.

I can't figure out why the two files are splitting differently. Any ideas?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/297bc662-7841-4423-bcbb-534e99bbba09n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2010 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found] ` <297bc662-7841-4423-bcbb-534e99bbba09n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-07-01 15:57   ` John MacFarlane
       [not found]     ` <m21r8ijabs.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2021-07-01 15:57 UTC (permalink / raw)
  To: Gary Glass, pandoc-discuss


No ideas. We'd have to see the actual files to know more.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]     ` <m21r8ijabs.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2021-07-02  7:07       ` Gary Glass
       [not found]         ` <38ac5d4c-8cba-4c23-a313-bf81e79779e7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Gary Glass @ 2021-07-02  7:07 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1206 bytes --]

I figured out the source of the issue. I had an html table in the markdown 
and I added a colgroup to the table. The colgroup caused the problem. 
Removing it made it go away.

Colgroup is not a commonly used tag (in my experience), but I think the bug 
is that pandoc shouldn't just emit invalid epub html when the source code 
is valid, even if it doesn't know what to do with it. Report an error or 
something! The html looked something like this:

<table>
    <colgroup>
        <col />
        <col style="width: 33%;" />
        <col />
    </colgroup>
    <tr>
        <td>...</td>
        <td>...</td>
        <td>...</td>
    </tr>
    ...
</table>

On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane wrote:

>
> No ideas. We'd have to see the actual files to know more.
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2149 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]         ` <38ac5d4c-8cba-4c23-a313-bf81e79779e7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-07-02 16:55           ` John MacFarlane
       [not found]             ` <m2o8bkveog.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2021-07-02 16:55 UTC (permalink / raw)
  To: Gary Glass, pandoc-discuss


Pandoc won't emit invalid HTML itself, but if you include
invalid HTML, it just dutifully passes it through verbatim.

Checking HTML syntax is not pandoc's job.  Use epubcheck
to verify the EPUB if you like.

Gary Glass <garyglassphotography-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I figured out the source of the issue. I had an html table in the markdown 
> and I added a colgroup to the table. The colgroup caused the problem. 
> Removing it made it go away.
>
> Colgroup is not a commonly used tag (in my experience), but I think the bug 
> is that pandoc shouldn't just emit invalid epub html when the source code 
> is valid, even if it doesn't know what to do with it. Report an error or 
> something! The html looked something like this:
>
> <table>
>     <colgroup>
>         <col />
>         <col style="width: 33%;" />
>         <col />
>     </colgroup>
>     <tr>
>         <td>...</td>
>         <td>...</td>
>         <td>...</td>
>     </tr>
>     ...
> </table>
>
> On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane wrote:
>
>>
>> No ideas. We'd have to see the actual files to know more.
>>
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]             ` <m2o8bkveog.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2021-07-06  6:31               ` Gary Glass
       [not found]                 ` <fd258aa4-a793-4d12-bb15-3f55fc2d0e4an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Gary Glass @ 2021-07-06  6:31 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2306 bytes --]

Here's the simplest file I could make to repro the issue. The pandoc 
command is very simple:

pandoc --output=bug.epub --to=epub3 bug.md

It produces an HTML file with a mismatched section tag.

If you comment out the colgroup, the output is fine.

On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote:

>
> Pandoc won't emit invalid HTML itself, but if you include
> invalid HTML, it just dutifully passes it through verbatim.
>
> Checking HTML syntax is not pandoc's job. Use epubcheck
> to verify the EPUB if you like.
>
> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I figured out the source of the issue. I had an html table in the 
> markdown 
> > and I added a colgroup to the table. The colgroup caused the problem. 
> > Removing it made it go away.
> >
> > Colgroup is not a commonly used tag (in my experience), but I think the 
> bug 
> > is that pandoc shouldn't just emit invalid epub html when the source 
> code 
> > is valid, even if it doesn't know what to do with it. Report an error or 
> > something! The html looked something like this:
> >
> > <table>
> > <colgroup>
> > <col />
> > <col style="width: 33%;" />
> > <col />
> > </colgroup>
> > <tr>
> > <td>...</td>
> > <td>...</td>
> > <td>...</td>
> > </tr>
> > ...
> > </table>
> >
> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane wrote:
> >
> >>
> >> No ideas. We'd have to see the actual files to know more.
> >>
> >>
> >
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3743 bytes --]

[-- Attachment #2: bug.md --]
[-- Type: text/markdown, Size: 270 bytes --]

# header 1

<div>

## header 2

<table>
	<colgroup>
		<col />
		<col />
		<col />
	</colgroup>
	<thead>
		<tr>
			<th>a</th>
			<th>b</th>
			<th>c</th>
		</tr>
	</thead>
	<tbody>
		<tr>
			<td>xxx</td>
			<td>xxx</td>
			<td>xxx</td>
		</tr>
	</tbody>
</table>

</div>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]                 ` <fd258aa4-a793-4d12-bb15-3f55fc2d0e4an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-07-06 16:19                   ` John MacFarlane
       [not found]                     ` <m21r8bjtys.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2021-07-06 16:19 UTC (permalink / raw)
  To: Gary Glass, pandoc-discuss


Thank you for the minimal test case!
Actually one can see the issue just with

pandoc --section-divs bug.md

At the end there is

</div>
</section>
</section>

where you'd want

</section>
</div>
</section>

The difference is that, with the colgroup, the <div> tags are
being parsed as raw HTML blocks, while without it, we get a
native Div in the AST (which is what we want in this case).

Somehow the colgroup is interfering with parsing of the native
Div.

If you don't mind reporting this at
https://github.com/jgm/pandoc/issues (including this information)
it will help us keep track.  Looking at the code, I currently
have no idea why this is happening.

Gary Glass <garyglassphotography-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Here's the simplest file I could make to repro the issue. The pandoc 
> command is very simple:
>
> pandoc --output=bug.epub --to=epub3 bug.md
>
> It produces an HTML file with a mismatched section tag.
>
> If you comment out the colgroup, the output is fine.
>
> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote:
>
>>
>> Pandoc won't emit invalid HTML itself, but if you include
>> invalid HTML, it just dutifully passes it through verbatim.
>>
>> Checking HTML syntax is not pandoc's job. Use epubcheck
>> to verify the EPUB if you like.
>>
>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > I figured out the source of the issue. I had an html table in the 
>> markdown 
>> > and I added a colgroup to the table. The colgroup caused the problem. 
>> > Removing it made it go away.
>> >
>> > Colgroup is not a commonly used tag (in my experience), but I think the 
>> bug 
>> > is that pandoc shouldn't just emit invalid epub html when the source 
>> code 
>> > is valid, even if it doesn't know what to do with it. Report an error or 
>> > something! The html looked something like this:
>> >
>> > <table>
>> > <colgroup>
>> > <col />
>> > <col style="width: 33%;" />
>> > <col />
>> > </colgroup>
>> > <tr>
>> > <td>...</td>
>> > <td>...</td>
>> > <td>...</td>
>> > </tr>
>> > ...
>> > </table>
>> >
>> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane wrote:
>> >
>> >>
>> >> No ideas. We'd have to see the actual files to know more.
>> >>
>> >>
>> >
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com
>> .
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com.
> # header 1
>
> <div>
>
> ## header 2
>
> <table>
> 	<colgroup>
> 		<col />
> 		<col />
> 		<col />
> 	</colgroup>
> 	<thead>
> 		<tr>
> 			<th>a</th>
> 			<th>b</th>
> 			<th>c</th>
> 		</tr>
> 	</thead>
> 	<tbody>
> 		<tr>
> 			<td>xxx</td>
> 			<td>xxx</td>
> 			<td>xxx</td>
> 		</tr>
> 	</tbody>
> </table>
>
> </div>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]                     ` <m21r8bjtys.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2021-07-06 16:31                       ` John MacFarlane
       [not found]                         ` <m2mtqzieuq.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2021-07-06 16:31 UTC (permalink / raw)
  To: Gary Glass, pandoc-discuss


Another big clue: if you remove the <col/> elements
from the <colgroup>, it works again. It also works if you use

<col></col>

instead of

<col />



John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:

> Thank you for the minimal test case!
> Actually one can see the issue just with
>
> pandoc --section-divs bug.md
>
> At the end there is
>
> </div>
> </section>
> </section>
>
> where you'd want
>
> </section>
> </div>
> </section>
>
> The difference is that, with the colgroup, the <div> tags are
> being parsed as raw HTML blocks, while without it, we get a
> native Div in the AST (which is what we want in this case).
>
> Somehow the colgroup is interfering with parsing of the native
> Div.
>
> If you don't mind reporting this at
> https://github.com/jgm/pandoc/issues (including this information)
> it will help us keep track.  Looking at the code, I currently
> have no idea why this is happening.
>
> Gary Glass <garyglassphotography-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Here's the simplest file I could make to repro the issue. The pandoc 
>> command is very simple:
>>
>> pandoc --output=bug.epub --to=epub3 bug.md
>>
>> It produces an HTML file with a mismatched section tag.
>>
>> If you comment out the colgroup, the output is fine.
>>
>> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote:
>>
>>>
>>> Pandoc won't emit invalid HTML itself, but if you include
>>> invalid HTML, it just dutifully passes it through verbatim.
>>>
>>> Checking HTML syntax is not pandoc's job. Use epubcheck
>>> to verify the EPUB if you like.
>>>
>>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>
>>> > I figured out the source of the issue. I had an html table in the 
>>> markdown 
>>> > and I added a colgroup to the table. The colgroup caused the problem. 
>>> > Removing it made it go away.
>>> >
>>> > Colgroup is not a commonly used tag (in my experience), but I think the 
>>> bug 
>>> > is that pandoc shouldn't just emit invalid epub html when the source 
>>> code 
>>> > is valid, even if it doesn't know what to do with it. Report an error or 
>>> > something! The html looked something like this:
>>> >
>>> > <table>
>>> > <colgroup>
>>> > <col />
>>> > <col style="width: 33%;" />
>>> > <col />
>>> > </colgroup>
>>> > <tr>
>>> > <td>...</td>
>>> > <td>...</td>
>>> > <td>...</td>
>>> > </tr>
>>> > ...
>>> > </table>
>>> >
>>> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane wrote:
>>> >
>>> >>
>>> >> No ideas. We'd have to see the actual files to know more.
>>> >>
>>> >>
>>> >
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> Groups "pandoc-discuss" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> > To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com
>>> .
>>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com.
>> # header 1
>>
>> <div>
>>
>> ## header 2
>>
>> <table>
>> 	<colgroup>
>> 		<col />
>> 		<col />
>> 		<col />
>> 	</colgroup>
>> 	<thead>
>> 		<tr>
>> 			<th>a</th>
>> 			<th>b</th>
>> 			<th>c</th>
>> 		</tr>
>> 	</thead>
>> 	<tbody>
>> 		<tr>
>> 			<td>xxx</td>
>> 			<td>xxx</td>
>> 			<td>xxx</td>
>> 		</tr>
>> 	</tbody>
>> </table>
>>
>> </div>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]                         ` <m2mtqzieuq.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2021-07-06 17:26                           ` John MacFarlane
       [not found]                             ` <m2k0m3icb1.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2021-07-06 17:26 UTC (permalink / raw)
  To: Gary Glass, pandoc-discuss


OK, I think I've fixed this in
commit f88ebf3ebf49e00ffa12778caf6817cc34459e6a

John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:

> Another big clue: if you remove the <col/> elements
> from the <colgroup>, it works again. It also works if you use
>
> <col></col>
>
> instead of
>
> <col />
>
>
>
> John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>
>> Thank you for the minimal test case!
>> Actually one can see the issue just with
>>
>> pandoc --section-divs bug.md
>>
>> At the end there is
>>
>> </div>
>> </section>
>> </section>
>>
>> where you'd want
>>
>> </section>
>> </div>
>> </section>
>>
>> The difference is that, with the colgroup, the <div> tags are
>> being parsed as raw HTML blocks, while without it, we get a
>> native Div in the AST (which is what we want in this case).
>>
>> Somehow the colgroup is interfering with parsing of the native
>> Div.
>>
>> If you don't mind reporting this at
>> https://github.com/jgm/pandoc/issues (including this information)
>> it will help us keep track.  Looking at the code, I currently
>> have no idea why this is happening.
>>
>> Gary Glass <garyglassphotography-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>>> Here's the simplest file I could make to repro the issue. The pandoc 
>>> command is very simple:
>>>
>>> pandoc --output=bug.epub --to=epub3 bug.md
>>>
>>> It produces an HTML file with a mismatched section tag.
>>>
>>> If you comment out the colgroup, the output is fine.
>>>
>>> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote:
>>>
>>>>
>>>> Pandoc won't emit invalid HTML itself, but if you include
>>>> invalid HTML, it just dutifully passes it through verbatim.
>>>>
>>>> Checking HTML syntax is not pandoc's job. Use epubcheck
>>>> to verify the EPUB if you like.
>>>>
>>>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>>
>>>> > I figured out the source of the issue. I had an html table in the 
>>>> markdown 
>>>> > and I added a colgroup to the table. The colgroup caused the problem. 
>>>> > Removing it made it go away.
>>>> >
>>>> > Colgroup is not a commonly used tag (in my experience), but I think the 
>>>> bug 
>>>> > is that pandoc shouldn't just emit invalid epub html when the source 
>>>> code 
>>>> > is valid, even if it doesn't know what to do with it. Report an error or 
>>>> > something! The html looked something like this:
>>>> >
>>>> > <table>
>>>> > <colgroup>
>>>> > <col />
>>>> > <col style="width: 33%;" />
>>>> > <col />
>>>> > </colgroup>
>>>> > <tr>
>>>> > <td>...</td>
>>>> > <td>...</td>
>>>> > <td>...</td>
>>>> > </tr>
>>>> > ...
>>>> > </table>
>>>> >
>>>> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane wrote:
>>>> >
>>>> >>
>>>> >> No ideas. We'd have to see the actual files to know more.
>>>> >>
>>>> >>
>>>> >
>>>> > -- 
>>>> > You received this message because you are subscribed to the Google 
>>>> Groups "pandoc-discuss" group.
>>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>> > To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com
>>>> .
>>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com.
>>> # header 1
>>>
>>> <div>
>>>
>>> ## header 2
>>>
>>> <table>
>>> 	<colgroup>
>>> 		<col />
>>> 		<col />
>>> 		<col />
>>> 	</colgroup>
>>> 	<thead>
>>> 		<tr>
>>> 			<th>a</th>
>>> 			<th>b</th>
>>> 			<th>c</th>
>>> 		</tr>
>>> 	</thead>
>>> 	<tbody>
>>> 		<tr>
>>> 			<td>xxx</td>
>>> 			<td>xxx</td>
>>> 			<td>xxx</td>
>>> 		</tr>
>>> 	</tbody>
>>> </table>
>>>
>>> </div>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]                             ` <m2k0m3icb1.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2021-07-07  7:30                               ` Gary Glass
       [not found]                                 ` <5d588eaf-0fd8-4023-8296-b9748189593cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Gary Glass @ 2021-07-07  7:30 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4929 bytes --]

Is there an installer for that rev? I'll be happy to test it.

On Tuesday, July 6, 2021 at 7:26:26 PM UTC+2 John MacFarlane wrote:

>
> OK, I think I've fixed this in
> commit f88ebf3ebf49e00ffa12778caf6817cc34459e6a
>
> John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>
> > Another big clue: if you remove the <col/> elements
> > from the <colgroup>, it works again. It also works if you use
> >
> > <col></col>
> >
> > instead of
> >
> > <col />
> >
> >
> >
> > John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
> >
> >> Thank you for the minimal test case!
> >> Actually one can see the issue just with
> >>
> >> pandoc --section-divs bug.md
> >>
> >> At the end there is
> >>
> >> </div>
> >> </section>
> >> </section>
> >>
> >> where you'd want
> >>
> >> </section>
> >> </div>
> >> </section>
> >>
> >> The difference is that, with the colgroup, the <div> tags are
> >> being parsed as raw HTML blocks, while without it, we get a
> >> native Div in the AST (which is what we want in this case).
> >>
> >> Somehow the colgroup is interfering with parsing of the native
> >> Div.
> >>
> >> If you don't mind reporting this at
> >> https://github.com/jgm/pandoc/issues (including this information)
> >> it will help us keep track. Looking at the code, I currently
> >> have no idea why this is happening.
> >>
> >> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >>
> >>> Here's the simplest file I could make to repro the issue. The pandoc 
> >>> command is very simple:
> >>>
> >>> pandoc --output=bug.epub --to=epub3 bug.md
> >>>
> >>> It produces an HTML file with a mismatched section tag.
> >>>
> >>> If you comment out the colgroup, the output is fine.
> >>>
> >>> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote:
> >>>
> >>>>
> >>>> Pandoc won't emit invalid HTML itself, but if you include
> >>>> invalid HTML, it just dutifully passes it through verbatim.
> >>>>
> >>>> Checking HTML syntax is not pandoc's job. Use epubcheck
> >>>> to verify the EPUB if you like.
> >>>>
> >>>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >>>>
> >>>> > I figured out the source of the issue. I had an html table in the 
> >>>> markdown 
> >>>> > and I added a colgroup to the table. The colgroup caused the 
> problem. 
> >>>> > Removing it made it go away.
> >>>> >
> >>>> > Colgroup is not a commonly used tag (in my experience), but I think 
> the 
> >>>> bug 
> >>>> > is that pandoc shouldn't just emit invalid epub html when the 
> source 
> >>>> code 
> >>>> > is valid, even if it doesn't know what to do with it. Report an 
> error or 
> >>>> > something! The html looked something like this:
> >>>> >
> >>>> > <table>
> >>>> > <colgroup>
> >>>> > <col />
> >>>> > <col style="width: 33%;" />
> >>>> > <col />
> >>>> > </colgroup>
> >>>> > <tr>
> >>>> > <td>...</td>
> >>>> > <td>...</td>
> >>>> > <td>...</td>
> >>>> > </tr>
> >>>> > ...
> >>>> > </table>
> >>>> >
> >>>> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane wrote:
> >>>> >
> >>>> >>
> >>>> >> No ideas. We'd have to see the actual files to know more.
> >>>> >>
> >>>> >>
> >>>> >
> >>>> > -- 
> >>>> > You received this message because you are subscribed to the Google 
> >>>> Groups "pandoc-discuss" group.
> >>>> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >>>> > To view this discussion on the web visit 
> >>>> 
> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com
> >>>> .
> >>>>
> >>>
> >>> -- 
> >>> You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >>> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com
> .
> >>> # header 1
> >>>
> >>> <div>
> >>>
> >>> ## header 2
> >>>
> >>> <table>
> >>> <colgroup>
> >>> <col />
> >>> <col />
> >>> <col />
> >>> </colgroup>
> >>> <thead>
> >>> <tr>
> >>> <th>a</th>
> >>> <th>b</th>
> >>> <th>c</th>
> >>> </tr>
> >>> </thead>
> >>> <tbody>
> >>> <tr>
> >>> <td>xxx</td>
> >>> <td>xxx</td>
> >>> <td>xxx</td>
> >>> </tr>
> >>> </tbody>
> >>> </table>
> >>>
> >>> </div>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5d588eaf-0fd8-4023-8296-b9748189593cn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 8938 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]                                 ` <5d588eaf-0fd8-4023-8296-b9748189593cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-07-07 16:47                                   ` John MacFarlane
       [not found]                                     ` <m27di2ul3e.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2021-07-07 16:47 UTC (permalink / raw)
  To: Gary Glass, pandoc-discuss


You can try a nightly.
https://github.com/jgm/pandoc/actions/runs/1007239404

Gary Glass <garyglassphotography-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Is there an installer for that rev? I'll be happy to test it.
>
> On Tuesday, July 6, 2021 at 7:26:26 PM UTC+2 John MacFarlane wrote:
>
>>
>> OK, I think I've fixed this in
>> commit f88ebf3ebf49e00ffa12778caf6817cc34459e6a
>>
>> John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>>
>> > Another big clue: if you remove the <col/> elements
>> > from the <colgroup>, it works again. It also works if you use
>> >
>> > <col></col>
>> >
>> > instead of
>> >
>> > <col />
>> >
>> >
>> >
>> > John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>> >
>> >> Thank you for the minimal test case!
>> >> Actually one can see the issue just with
>> >>
>> >> pandoc --section-divs bug.md
>> >>
>> >> At the end there is
>> >>
>> >> </div>
>> >> </section>
>> >> </section>
>> >>
>> >> where you'd want
>> >>
>> >> </section>
>> >> </div>
>> >> </section>
>> >>
>> >> The difference is that, with the colgroup, the <div> tags are
>> >> being parsed as raw HTML blocks, while without it, we get a
>> >> native Div in the AST (which is what we want in this case).
>> >>
>> >> Somehow the colgroup is interfering with parsing of the native
>> >> Div.
>> >>
>> >> If you don't mind reporting this at
>> >> https://github.com/jgm/pandoc/issues (including this information)
>> >> it will help us keep track. Looking at the code, I currently
>> >> have no idea why this is happening.
>> >>
>> >> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >>
>> >>> Here's the simplest file I could make to repro the issue. The pandoc 
>> >>> command is very simple:
>> >>>
>> >>> pandoc --output=bug.epub --to=epub3 bug.md
>> >>>
>> >>> It produces an HTML file with a mismatched section tag.
>> >>>
>> >>> If you comment out the colgroup, the output is fine.
>> >>>
>> >>> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote:
>> >>>
>> >>>>
>> >>>> Pandoc won't emit invalid HTML itself, but if you include
>> >>>> invalid HTML, it just dutifully passes it through verbatim.
>> >>>>
>> >>>> Checking HTML syntax is not pandoc's job. Use epubcheck
>> >>>> to verify the EPUB if you like.
>> >>>>
>> >>>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >>>>
>> >>>> > I figured out the source of the issue. I had an html table in the 
>> >>>> markdown 
>> >>>> > and I added a colgroup to the table. The colgroup caused the 
>> problem. 
>> >>>> > Removing it made it go away.
>> >>>> >
>> >>>> > Colgroup is not a commonly used tag (in my experience), but I think 
>> the 
>> >>>> bug 
>> >>>> > is that pandoc shouldn't just emit invalid epub html when the 
>> source 
>> >>>> code 
>> >>>> > is valid, even if it doesn't know what to do with it. Report an 
>> error or 
>> >>>> > something! The html looked something like this:
>> >>>> >
>> >>>> > <table>
>> >>>> > <colgroup>
>> >>>> > <col />
>> >>>> > <col style="width: 33%;" />
>> >>>> > <col />
>> >>>> > </colgroup>
>> >>>> > <tr>
>> >>>> > <td>...</td>
>> >>>> > <td>...</td>
>> >>>> > <td>...</td>
>> >>>> > </tr>
>> >>>> > ...
>> >>>> > </table>
>> >>>> >
>> >>>> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane wrote:
>> >>>> >
>> >>>> >>
>> >>>> >> No ideas. We'd have to see the actual files to know more.
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>> > -- 
>> >>>> > You received this message because you are subscribed to the Google 
>> >>>> Groups "pandoc-discuss" group.
>> >>>> > To unsubscribe from this group and stop receiving emails from it, 
>> send 
>> >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >>>> > To view this discussion on the web visit 
>> >>>> 
>> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com
>> >>>> .
>> >>>>
>> >>>
>> >>> -- 
>> >>> You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group.
>> >>> To unsubscribe from this group and stop receiving emails from it, send 
>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com
>> .
>> >>> # header 1
>> >>>
>> >>> <div>
>> >>>
>> >>> ## header 2
>> >>>
>> >>> <table>
>> >>> <colgroup>
>> >>> <col />
>> >>> <col />
>> >>> <col />
>> >>> </colgroup>
>> >>> <thead>
>> >>> <tr>
>> >>> <th>a</th>
>> >>> <th>b</th>
>> >>> <th>c</th>
>> >>> </tr>
>> >>> </thead>
>> >>> <tbody>
>> >>> <tr>
>> >>> <td>xxx</td>
>> >>> <td>xxx</td>
>> >>> <td>xxx</td>
>> >>> </tr>
>> >>> </tbody>
>> >>> </table>
>> >>>
>> >>> </div>
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5d588eaf-0fd8-4023-8296-b9748189593cn%40googlegroups.com.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]                                     ` <m27di2ul3e.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2021-07-09  5:56                                       ` Gary Glass
       [not found]                                         ` <53332f68-2c48-4416-91b8-8e34395d0859n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Gary Glass @ 2021-07-09  5:56 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 6193 bytes --]

Well for some reason that doesn't work. The pandoc.exe just hangs when I 
run it.

On Wednesday, July 7, 2021 at 6:48:04 PM UTC+2 John MacFarlane wrote:

>
> You can try a nightly.
> https://github.com/jgm/pandoc/actions/runs/1007239404
>
> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Is there an installer for that rev? I'll be happy to test it.
> >
> > On Tuesday, July 6, 2021 at 7:26:26 PM UTC+2 John MacFarlane wrote:
> >
> >>
> >> OK, I think I've fixed this in
> >> commit f88ebf3ebf49e00ffa12778caf6817cc34459e6a
> >>
> >> John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
> >>
> >> > Another big clue: if you remove the <col/> elements
> >> > from the <colgroup>, it works again. It also works if you use
> >> >
> >> > <col></col>
> >> >
> >> > instead of
> >> >
> >> > <col />
> >> >
> >> >
> >> >
> >> > John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
> >> >
> >> >> Thank you for the minimal test case!
> >> >> Actually one can see the issue just with
> >> >>
> >> >> pandoc --section-divs bug.md
> >> >>
> >> >> At the end there is
> >> >>
> >> >> </div>
> >> >> </section>
> >> >> </section>
> >> >>
> >> >> where you'd want
> >> >>
> >> >> </section>
> >> >> </div>
> >> >> </section>
> >> >>
> >> >> The difference is that, with the colgroup, the <div> tags are
> >> >> being parsed as raw HTML blocks, while without it, we get a
> >> >> native Div in the AST (which is what we want in this case).
> >> >>
> >> >> Somehow the colgroup is interfering with parsing of the native
> >> >> Div.
> >> >>
> >> >> If you don't mind reporting this at
> >> >> https://github.com/jgm/pandoc/issues (including this information)
> >> >> it will help us keep track. Looking at the code, I currently
> >> >> have no idea why this is happening.
> >> >>
> >> >> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >> >>
> >> >>> Here's the simplest file I could make to repro the issue. The 
> pandoc 
> >> >>> command is very simple:
> >> >>>
> >> >>> pandoc --output=bug.epub --to=epub3 bug.md
> >> >>>
> >> >>> It produces an HTML file with a mismatched section tag.
> >> >>>
> >> >>> If you comment out the colgroup, the output is fine.
> >> >>>
> >> >>> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote:
> >> >>>
> >> >>>>
> >> >>>> Pandoc won't emit invalid HTML itself, but if you include
> >> >>>> invalid HTML, it just dutifully passes it through verbatim.
> >> >>>>
> >> >>>> Checking HTML syntax is not pandoc's job. Use epubcheck
> >> >>>> to verify the EPUB if you like.
> >> >>>>
> >> >>>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >> >>>>
> >> >>>> > I figured out the source of the issue. I had an html table in 
> the 
> >> >>>> markdown 
> >> >>>> > and I added a colgroup to the table. The colgroup caused the 
> >> problem. 
> >> >>>> > Removing it made it go away.
> >> >>>> >
> >> >>>> > Colgroup is not a commonly used tag (in my experience), but I 
> think 
> >> the 
> >> >>>> bug 
> >> >>>> > is that pandoc shouldn't just emit invalid epub html when the 
> >> source 
> >> >>>> code 
> >> >>>> > is valid, even if it doesn't know what to do with it. Report an 
> >> error or 
> >> >>>> > something! The html looked something like this:
> >> >>>> >
> >> >>>> > <table>
> >> >>>> > <colgroup>
> >> >>>> > <col />
> >> >>>> > <col style="width: 33%;" />
> >> >>>> > <col />
> >> >>>> > </colgroup>
> >> >>>> > <tr>
> >> >>>> > <td>...</td>
> >> >>>> > <td>...</td>
> >> >>>> > <td>...</td>
> >> >>>> > </tr>
> >> >>>> > ...
> >> >>>> > </table>
> >> >>>> >
> >> >>>> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane 
> wrote:
> >> >>>> >
> >> >>>> >>
> >> >>>> >> No ideas. We'd have to see the actual files to know more.
> >> >>>> >>
> >> >>>> >>
> >> >>>> >
> >> >>>> > -- 
> >> >>>> > You received this message because you are subscribed to the 
> Google 
> >> >>>> Groups "pandoc-discuss" group.
> >> >>>> > To unsubscribe from this group and stop receiving emails from 
> it, 
> >> send 
> >> >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> >>>> > To view this discussion on the web visit 
> >> >>>> 
> >> 
> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com
> >> >>>> .
> >> >>>>
> >> >>>
> >> >>> -- 
> >> >>> You received this message because you are subscribed to the Google 
> >> Groups "pandoc-discuss" group.
> >> >>> To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> >>> To view this discussion on the web visit 
> >> 
> https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com
> >> .
> >> >>> # header 1
> >> >>>
> >> >>> <div>
> >> >>>
> >> >>> ## header 2
> >> >>>
> >> >>> <table>
> >> >>> <colgroup>
> >> >>> <col />
> >> >>> <col />
> >> >>> <col />
> >> >>> </colgroup>
> >> >>> <thead>
> >> >>> <tr>
> >> >>> <th>a</th>
> >> >>> <th>b</th>
> >> >>> <th>c</th>
> >> >>> </tr>
> >> >>> </thead>
> >> >>> <tbody>
> >> >>> <tr>
> >> >>> <td>xxx</td>
> >> >>> <td>xxx</td>
> >> >>> <td>xxx</td>
> >> >>> </tr>
> >> >>> </tbody>
> >> >>> </table>
> >> >>>
> >> >>> </div>
> >>
> >
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/5d588eaf-0fd8-4023-8296-b9748189593cn%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/53332f68-2c48-4416-91b8-8e34395d0859n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 12008 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]                                         ` <53332f68-2c48-4416-91b8-8e34395d0859n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-07-09 18:34                                           ` John MacFarlane
       [not found]                                             ` <m2a6mvtjy3.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2021-07-09 18:34 UTC (permalink / raw)
  To: Gary Glass, pandoc-discuss


What exact command line are you using?

Does

pandoc --version

work?

Gary Glass <garyglassphotography-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Well for some reason that doesn't work. The pandoc.exe just hangs when I 
> run it.
>
> On Wednesday, July 7, 2021 at 6:48:04 PM UTC+2 John MacFarlane wrote:
>
>>
>> You can try a nightly.
>> https://github.com/jgm/pandoc/actions/runs/1007239404
>>
>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > Is there an installer for that rev? I'll be happy to test it.
>> >
>> > On Tuesday, July 6, 2021 at 7:26:26 PM UTC+2 John MacFarlane wrote:
>> >
>> >>
>> >> OK, I think I've fixed this in
>> >> commit f88ebf3ebf49e00ffa12778caf6817cc34459e6a
>> >>
>> >> John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>> >>
>> >> > Another big clue: if you remove the <col/> elements
>> >> > from the <colgroup>, it works again. It also works if you use
>> >> >
>> >> > <col></col>
>> >> >
>> >> > instead of
>> >> >
>> >> > <col />
>> >> >
>> >> >
>> >> >
>> >> > John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>> >> >
>> >> >> Thank you for the minimal test case!
>> >> >> Actually one can see the issue just with
>> >> >>
>> >> >> pandoc --section-divs bug.md
>> >> >>
>> >> >> At the end there is
>> >> >>
>> >> >> </div>
>> >> >> </section>
>> >> >> </section>
>> >> >>
>> >> >> where you'd want
>> >> >>
>> >> >> </section>
>> >> >> </div>
>> >> >> </section>
>> >> >>
>> >> >> The difference is that, with the colgroup, the <div> tags are
>> >> >> being parsed as raw HTML blocks, while without it, we get a
>> >> >> native Div in the AST (which is what we want in this case).
>> >> >>
>> >> >> Somehow the colgroup is interfering with parsing of the native
>> >> >> Div.
>> >> >>
>> >> >> If you don't mind reporting this at
>> >> >> https://github.com/jgm/pandoc/issues (including this information)
>> >> >> it will help us keep track. Looking at the code, I currently
>> >> >> have no idea why this is happening.
>> >> >>
>> >> >> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >> >>
>> >> >>> Here's the simplest file I could make to repro the issue. The 
>> pandoc 
>> >> >>> command is very simple:
>> >> >>>
>> >> >>> pandoc --output=bug.epub --to=epub3 bug.md
>> >> >>>
>> >> >>> It produces an HTML file with a mismatched section tag.
>> >> >>>
>> >> >>> If you comment out the colgroup, the output is fine.
>> >> >>>
>> >> >>> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane wrote:
>> >> >>>
>> >> >>>>
>> >> >>>> Pandoc won't emit invalid HTML itself, but if you include
>> >> >>>> invalid HTML, it just dutifully passes it through verbatim.
>> >> >>>>
>> >> >>>> Checking HTML syntax is not pandoc's job. Use epubcheck
>> >> >>>> to verify the EPUB if you like.
>> >> >>>>
>> >> >>>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> >> >>>>
>> >> >>>> > I figured out the source of the issue. I had an html table in 
>> the 
>> >> >>>> markdown 
>> >> >>>> > and I added a colgroup to the table. The colgroup caused the 
>> >> problem. 
>> >> >>>> > Removing it made it go away.
>> >> >>>> >
>> >> >>>> > Colgroup is not a commonly used tag (in my experience), but I 
>> think 
>> >> the 
>> >> >>>> bug 
>> >> >>>> > is that pandoc shouldn't just emit invalid epub html when the 
>> >> source 
>> >> >>>> code 
>> >> >>>> > is valid, even if it doesn't know what to do with it. Report an 
>> >> error or 
>> >> >>>> > something! The html looked something like this:
>> >> >>>> >
>> >> >>>> > <table>
>> >> >>>> > <colgroup>
>> >> >>>> > <col />
>> >> >>>> > <col style="width: 33%;" />
>> >> >>>> > <col />
>> >> >>>> > </colgroup>
>> >> >>>> > <tr>
>> >> >>>> > <td>...</td>
>> >> >>>> > <td>...</td>
>> >> >>>> > <td>...</td>
>> >> >>>> > </tr>
>> >> >>>> > ...
>> >> >>>> > </table>
>> >> >>>> >
>> >> >>>> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane 
>> wrote:
>> >> >>>> >
>> >> >>>> >>
>> >> >>>> >> No ideas. We'd have to see the actual files to know more.
>> >> >>>> >>
>> >> >>>> >>
>> >> >>>> >
>> >> >>>> > -- 
>> >> >>>> > You received this message because you are subscribed to the 
>> Google 
>> >> >>>> Groups "pandoc-discuss" group.
>> >> >>>> > To unsubscribe from this group and stop receiving emails from 
>> it, 
>> >> send 
>> >> >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >> >>>> > To view this discussion on the web visit 
>> >> >>>> 
>> >> 
>> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com
>> >> >>>> .
>> >> >>>>
>> >> >>>
>> >> >>> -- 
>> >> >>> You received this message because you are subscribed to the Google 
>> >> Groups "pandoc-discuss" group.
>> >> >>> To unsubscribe from this group and stop receiving emails from it, 
>> send 
>> >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >> >>> To view this discussion on the web visit 
>> >> 
>> https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com
>> >> .
>> >> >>> # header 1
>> >> >>>
>> >> >>> <div>
>> >> >>>
>> >> >>> ## header 2
>> >> >>>
>> >> >>> <table>
>> >> >>> <colgroup>
>> >> >>> <col />
>> >> >>> <col />
>> >> >>> <col />
>> >> >>> </colgroup>
>> >> >>> <thead>
>> >> >>> <tr>
>> >> >>> <th>a</th>
>> >> >>> <th>b</th>
>> >> >>> <th>c</th>
>> >> >>> </tr>
>> >> >>> </thead>
>> >> >>> <tbody>
>> >> >>> <tr>
>> >> >>> <td>xxx</td>
>> >> >>> <td>xxx</td>
>> >> >>> <td>xxx</td>
>> >> >>> </tr>
>> >> >>> </tbody>
>> >> >>> </table>
>> >> >>>
>> >> >>> </div>
>> >>
>> >
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/5d588eaf-0fd8-4023-8296-b9748189593cn%40googlegroups.com
>> .
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/53332f68-2c48-4416-91b8-8e34395d0859n%40googlegroups.com.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: File splitting bug
       [not found]                                             ` <m2a6mvtjy3.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2021-07-09 19:57                                               ` Gary Glass
  0 siblings, 0 replies; 13+ messages in thread
From: Gary Glass @ 2021-07-09 19:57 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 7670 bytes --]

Sorry, stupid pilot error! But now I've run the nightly build (pandoc.exe 
2.14.0.3) against the original markdown files that revealed the issue and 
it seems to be working perfectly. Thanks for the quick fix!

On Friday, July 9, 2021 at 8:34:59 PM UTC+2 John MacFarlane wrote:

>
> What exact command line are you using?
>
> Does
>
> pandoc --version
>
> work?
>
> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Well for some reason that doesn't work. The pandoc.exe just hangs when I 
> > run it.
> >
> > On Wednesday, July 7, 2021 at 6:48:04 PM UTC+2 John MacFarlane wrote:
> >
> >>
> >> You can try a nightly.
> >> https://github.com/jgm/pandoc/actions/runs/1007239404
> >>
> >> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >>
> >> > Is there an installer for that rev? I'll be happy to test it.
> >> >
> >> > On Tuesday, July 6, 2021 at 7:26:26 PM UTC+2 John MacFarlane wrote:
> >> >
> >> >>
> >> >> OK, I think I've fixed this in
> >> >> commit f88ebf3ebf49e00ffa12778caf6817cc34459e6a
> >> >>
> >> >> John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
> >> >>
> >> >> > Another big clue: if you remove the <col/> elements
> >> >> > from the <colgroup>, it works again. It also works if you use
> >> >> >
> >> >> > <col></col>
> >> >> >
> >> >> > instead of
> >> >> >
> >> >> > <col />
> >> >> >
> >> >> >
> >> >> >
> >> >> > John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
> >> >> >
> >> >> >> Thank you for the minimal test case!
> >> >> >> Actually one can see the issue just with
> >> >> >>
> >> >> >> pandoc --section-divs bug.md
> >> >> >>
> >> >> >> At the end there is
> >> >> >>
> >> >> >> </div>
> >> >> >> </section>
> >> >> >> </section>
> >> >> >>
> >> >> >> where you'd want
> >> >> >>
> >> >> >> </section>
> >> >> >> </div>
> >> >> >> </section>
> >> >> >>
> >> >> >> The difference is that, with the colgroup, the <div> tags are
> >> >> >> being parsed as raw HTML blocks, while without it, we get a
> >> >> >> native Div in the AST (which is what we want in this case).
> >> >> >>
> >> >> >> Somehow the colgroup is interfering with parsing of the native
> >> >> >> Div.
> >> >> >>
> >> >> >> If you don't mind reporting this at
> >> >> >> https://github.com/jgm/pandoc/issues (including this information)
> >> >> >> it will help us keep track. Looking at the code, I currently
> >> >> >> have no idea why this is happening.
> >> >> >>
> >> >> >> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >> >> >>
> >> >> >>> Here's the simplest file I could make to repro the issue. The 
> >> pandoc 
> >> >> >>> command is very simple:
> >> >> >>>
> >> >> >>> pandoc --output=bug.epub --to=epub3 bug.md
> >> >> >>>
> >> >> >>> It produces an HTML file with a mismatched section tag.
> >> >> >>>
> >> >> >>> If you comment out the colgroup, the output is fine.
> >> >> >>>
> >> >> >>> On Friday, July 2, 2021 at 6:55:27 PM UTC+2 John MacFarlane 
> wrote:
> >> >> >>>
> >> >> >>>>
> >> >> >>>> Pandoc won't emit invalid HTML itself, but if you include
> >> >> >>>> invalid HTML, it just dutifully passes it through verbatim.
> >> >> >>>>
> >> >> >>>> Checking HTML syntax is not pandoc's job. Use epubcheck
> >> >> >>>> to verify the EPUB if you like.
> >> >> >>>>
> >> >> >>>> Gary Glass <garyglassp...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >> >> >>>>
> >> >> >>>> > I figured out the source of the issue. I had an html table in 
> >> the 
> >> >> >>>> markdown 
> >> >> >>>> > and I added a colgroup to the table. The colgroup caused the 
> >> >> problem. 
> >> >> >>>> > Removing it made it go away.
> >> >> >>>> >
> >> >> >>>> > Colgroup is not a commonly used tag (in my experience), but I 
> >> think 
> >> >> the 
> >> >> >>>> bug 
> >> >> >>>> > is that pandoc shouldn't just emit invalid epub html when the 
> >> >> source 
> >> >> >>>> code 
> >> >> >>>> > is valid, even if it doesn't know what to do with it. Report 
> an 
> >> >> error or 
> >> >> >>>> > something! The html looked something like this:
> >> >> >>>> >
> >> >> >>>> > <table>
> >> >> >>>> > <colgroup>
> >> >> >>>> > <col />
> >> >> >>>> > <col style="width: 33%;" />
> >> >> >>>> > <col />
> >> >> >>>> > </colgroup>
> >> >> >>>> > <tr>
> >> >> >>>> > <td>...</td>
> >> >> >>>> > <td>...</td>
> >> >> >>>> > <td>...</td>
> >> >> >>>> > </tr>
> >> >> >>>> > ...
> >> >> >>>> > </table>
> >> >> >>>> >
> >> >> >>>> > On Thursday, July 1, 2021 at 5:57:57 PM UTC+2 John MacFarlane 
> >> wrote:
> >> >> >>>> >
> >> >> >>>> >>
> >> >> >>>> >> No ideas. We'd have to see the actual files to know more.
> >> >> >>>> >>
> >> >> >>>> >>
> >> >> >>>> >
> >> >> >>>> > -- 
> >> >> >>>> > You received this message because you are subscribed to the 
> >> Google 
> >> >> >>>> Groups "pandoc-discuss" group.
> >> >> >>>> > To unsubscribe from this group and stop receiving emails from 
> >> it, 
> >> >> send 
> >> >> >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> >> >>>> > To view this discussion on the web visit 
> >> >> >>>> 
> >> >> 
> >> 
> https://groups.google.com/d/msgid/pandoc-discuss/38ac5d4c-8cba-4c23-a313-bf81e79779e7n%40googlegroups.com
> >> >> >>>> .
> >> >> >>>>
> >> >> >>>
> >> >> >>> -- 
> >> >> >>> You received this message because you are subscribed to the 
> Google 
> >> >> Groups "pandoc-discuss" group.
> >> >> >>> To unsubscribe from this group and stop receiving emails from 
> it, 
> >> send 
> >> >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> >> >>> To view this discussion on the web visit 
> >> >> 
> >> 
> https://groups.google.com/d/msgid/pandoc-discuss/fd258aa4-a793-4d12-bb15-3f55fc2d0e4an%40googlegroups.com
> >> >> .
> >> >> >>> # header 1
> >> >> >>>
> >> >> >>> <div>
> >> >> >>>
> >> >> >>> ## header 2
> >> >> >>>
> >> >> >>> <table>
> >> >> >>> <colgroup>
> >> >> >>> <col />
> >> >> >>> <col />
> >> >> >>> <col />
> >> >> >>> </colgroup>
> >> >> >>> <thead>
> >> >> >>> <tr>
> >> >> >>> <th>a</th>
> >> >> >>> <th>b</th>
> >> >> >>> <th>c</th>
> >> >> >>> </tr>
> >> >> >>> </thead>
> >> >> >>> <tbody>
> >> >> >>> <tr>
> >> >> >>> <td>xxx</td>
> >> >> >>> <td>xxx</td>
> >> >> >>> <td>xxx</td>
> >> >> >>> </tr>
> >> >> >>> </tbody>
> >> >> >>> </table>
> >> >> >>>
> >> >> >>> </div>
> >> >>
> >> >
> >> > -- 
> >> > You received this message because you are subscribed to the Google 
> >> Groups "pandoc-discuss" group.
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> > To view this discussion on the web visit 
> >> 
> https://groups.google.com/d/msgid/pandoc-discuss/5d588eaf-0fd8-4023-8296-b9748189593cn%40googlegroups.com
> >> .
> >>
> >
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/53332f68-2c48-4416-91b8-8e34395d0859n%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/0accc6a5-c14d-43da-bc1d-cdc944090e83n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 15248 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-07-09 19:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-01  6:46 File splitting bug Gary Glass
     [not found] ` <297bc662-7841-4423-bcbb-534e99bbba09n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-07-01 15:57   ` John MacFarlane
     [not found]     ` <m21r8ijabs.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-07-02  7:07       ` Gary Glass
     [not found]         ` <38ac5d4c-8cba-4c23-a313-bf81e79779e7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-07-02 16:55           ` John MacFarlane
     [not found]             ` <m2o8bkveog.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-07-06  6:31               ` Gary Glass
     [not found]                 ` <fd258aa4-a793-4d12-bb15-3f55fc2d0e4an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-07-06 16:19                   ` John MacFarlane
     [not found]                     ` <m21r8bjtys.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-07-06 16:31                       ` John MacFarlane
     [not found]                         ` <m2mtqzieuq.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-07-06 17:26                           ` John MacFarlane
     [not found]                             ` <m2k0m3icb1.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-07-07  7:30                               ` Gary Glass
     [not found]                                 ` <5d588eaf-0fd8-4023-8296-b9748189593cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-07-07 16:47                                   ` John MacFarlane
     [not found]                                     ` <m27di2ul3e.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-07-09  5:56                                       ` Gary Glass
     [not found]                                         ` <53332f68-2c48-4416-91b8-8e34395d0859n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-07-09 18:34                                           ` John MacFarlane
     [not found]                                             ` <m2a6mvtjy3.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-07-09 19:57                                               ` Gary Glass

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).