Hi,

I found a strange behaviour when converting some HTML files to asciidoc.

Versions used:
asciidoc 9.1.0
pandoc 2.16.2

Example input:

<!DOCTYPE HTML>
<html>
<head>
<title>Xx</title>
</head>
<body>
<a href="x.htm"><i>Xx</i></a><i>,</i>
</body>
</html>


With "pandoc --wrap=none -f html -t asciidoc" I get this asciidoc output:

link:x.htm[_Xx_]__,__

The double underscores look "suspicious" and with "asciidoc -b docbook;xmllint" I get:

z.xml:10: parser error : Unescaped '<' not allowed in attributes values
<simpara>link:x.htm<emphasis><phrase role="<emphasis>Xx</emphasis>">,</phrase></


The related docbook line which was created by asciidoc:

<simpara>link:x.htm<emphasis><phrase role="<emphasis>Xx</emphasis>">,</phrase></emphasis></simpara>

Is this a known bug?


If I add a space before comma...

<a href="x.htm"><i>Xx</i></a><i> ,</i>

then I get

link:x.htm[_Xx_] _,_

which causes no issue. Also adding a space before the emphasis...

<a href="x.htm"><i>Xx</i></a> <i>,</i>


create an asciidoc file which can be rendered:

link:x.htm[_Xx_] _,_



Does someone know this? Does a fix already exist?


cheers,
Frank


--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3f7b920b-c982-5be5-fa04-9025e008e518%40tuxad.com.