* name attribute lost in <a> by -parse-raw html reader, since pandoc 1.16
@ 2016-11-11 23:57 john.r.rose-QHcLZuEGTsvQT0dZR+AlfA
[not found] ` <d1a2140b-7c27-4dc5-9d00-6d5a601d0dfa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: john.r.rose-QHcLZuEGTsvQT0dZR+AlfA @ 2016-11-11 23:57 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1743 bytes --]
I am parsing legacy HTML files that make heavy use of <a name=>.
I'm using -parse-raw to preserve as much HTML structure as possible.
(The result translates to a mix of pandoc-markdown and HTML fragments.)
Pandoc apparently mistreats <a name=> even in -parse-raw mode. This feels
like a bug.
Here's the behavior:
$ echo '<a name="anchor"/>hello world' | /usr/local/bin/pandoc-1.15.2
--parse-raw -f html -t html
<a name="anchor"></a>hello world
$ echo '<a name="anchor"/>hello world' | /usr/local/bin/pandoc-1.16.0.2
--parse-raw -f html -t html
<a href=""></a>hello world
(And so on, up to the present 1.18.)
If I use <a id="anchor"> instead, as a workaround, I get either <span
id="anchor"> or <a href="" id="anchor">,
depending on release. Those are OK, but they seem to be short of the mark
also.
In any case, because I am working with legacy files, it would be unpleasant
to preprocess the <a name=>
attributes to <a id=> attributes.
It would be pleasant for me if the pandoc authors agreed that the present
behavior is a bug,
and made -parse-raw preserve the name attributes.
Comments? Am I missing a workaround or a theory of operation?
Thanks.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d1a2140b-7c27-4dc5-9d00-6d5a601d0dfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 2696 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: name attribute lost in <a> by -parse-raw html reader, since pandoc 1.16
[not found] ` <d1a2140b-7c27-4dc5-9d00-6d5a601d0dfa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-11-12 0:34 ` John MacFarlane
[not found] ` <20161112003422.GH77829-jF64zX8BO0/xZR0Txf6TOv112/MQ1Lpv@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2016-11-12 0:34 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
With pandoc 1.18:
% echo '<a name="anchor"/>hello world' | pandoc
<p><a name="anchor"/>hello world</p>
(Same thing with or without --parse-raw.) Is this what you expect?
+++ john.r.rose-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org [Nov 11 16 15:57 ]:
> I am parsing legacy HTML files that make heavy use of <a name=>.
> I'm using -parse-raw to preserve as much HTML structure as possible.
> (The result translates to a mix of pandoc-markdown and HTML fragments.)
> Pandoc apparently mistreats <a name=> even in -parse-raw mode. This
> feels like a bug.
> Here's the behavior:
> $ echo '<a name="anchor"/>hello world' | /usr/local/bin/pandoc-1.15.2
> --parse-raw -f html -t html
> <a name="anchor"></a>hello world
> $ echo '<a name="anchor"/>hello world' | /usr/local/bin/pandoc-1.16.0.2
> --parse-raw -f html -t html
> <a href=""></a>hello world
> (And so on, up to the present 1.18.)
> If I use <a id="anchor"> instead, as a workaround, I get either <span
> id="anchor"> or <a href="" id="anchor">,
> depending on release. Those are OK, but they seem to be short of the
> mark also.
> In any case, because I am working with legacy files, it would be
> unpleasant to preprocess the <a name=>
> attributes to <a id=> attributes.
> It would be pleasant for me if the pandoc authors agreed that the
> present behavior is a bug,
> and made -parse-raw preserve the name attributes.
> Comments? Am I missing a workaround or a theory of operation?
> Thanks.
>
> --
> You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to
> [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> [3]https://groups.google.com/d/msgid/pandoc-discuss/d1a2140b-7c27-4dc5-
> 9d00-6d5a601d0dfa%40googlegroups.com.
> For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
> 1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> 2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> 3. https://groups.google.com/d/msgid/pandoc-discuss/d1a2140b-7c27-4dc5-9d00-6d5a601d0dfa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
> 4. https://groups.google.com/d/optout
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: name attribute lost in <a> by -parse-raw html reader, since pandoc 1.16
[not found] ` <20161112003422.GH77829-jF64zX8BO0/xZR0Txf6TOv112/MQ1Lpv@public.gmane.org>
@ 2016-11-12 1:28 ` john.r.rose-QHcLZuEGTsvQT0dZR+AlfA
[not found] ` <c5632684-9f96-4433-9061-fe960be21238-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: john.r.rose-QHcLZuEGTsvQT0dZR+AlfA @ 2016-11-12 1:28 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1006 bytes --]
On Friday, November 11, 2016 at 4:34:25 PM UTC-8, John MacFarlane wrote:
>
> With pandoc 1.18:
>
> % echo '<a name="anchor"/>hello world' | pandoc
> <p><a name="anchor"/>hello world</p>
>
> (Same thing with or without --parse-raw.) Is this what you expect?
>
>
Yes. But then:
% echo '<a name="anchor"/>hello world' | /usr/local/bin/pandoc-1.18 -f html
<a href=""></a>hello world
The "-f html" does something surprising here.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c5632684-9f96-4433-9061-fe960be21238%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 1746 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: name attribute lost in <a> by -parse-raw html reader, since pandoc 1.16
[not found] ` <c5632684-9f96-4433-9061-fe960be21238-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-11-12 1:54 ` john.r.rose-QHcLZuEGTsvQT0dZR+AlfA
0 siblings, 0 replies; 4+ messages in thread
From: john.r.rose-QHcLZuEGTsvQT0dZR+AlfA @ 2016-11-12 1:54 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1043 bytes --]
More info:
% echo '<a bozo="clown"/>hello world' | /usr/local/bin/pandoc-1.18 -f
markdown -t html
<p><a bozo="clown"/>hello world</p>
So markdown input copies through HTML fragments without trying to
understand them. But:
% echo '<a bozo="clown"/>hello world' | /usr/local/bin/pandoc-1.18 -f html
-t html
<a href=""></a>hello world
The HTML reader apparently normalizes away the "name=" attribute of an <a>
tag.
I filed https://github.com/jgm/pandoc/issues/3226
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f0443e22-6858-4970-80e8-a4b0b8803c09%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 1725 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-11-12 1:54 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-11 23:57 name attribute lost in <a> by -parse-raw html reader, since pandoc 1.16 john.r.rose-QHcLZuEGTsvQT0dZR+AlfA
[not found] ` <d1a2140b-7c27-4dc5-9d00-6d5a601d0dfa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-11-12 0:34 ` John MacFarlane
[not found] ` <20161112003422.GH77829-jF64zX8BO0/xZR0Txf6TOv112/MQ1Lpv@public.gmane.org>
2016-11-12 1:28 ` john.r.rose-QHcLZuEGTsvQT0dZR+AlfA
[not found] ` <c5632684-9f96-4433-9061-fe960be21238-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-11-12 1:54 ` john.r.rose-QHcLZuEGTsvQT0dZR+AlfA
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).