The trouble with using -r html or -f html is that this strips out the <head> element, so I lose the formatting.

That is, if I apply pandoc -r html -t html+smart to this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title></title>
<meta name="Generator" content="Cocoa HTML Writer">
<meta name="CocoaVersion" content="2113">
<style type="text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Arial; color: #151515; -webkit-text-stroke: #151515; background-color: #d5e4ff}
span.s1 {font-kerning: none}
span.s2 {font: 16.0px Courier; font-kerning: none; background-color: #f1f1f1}
</style>
</head>
<body>
<p class="p1"><span class="s1">Font names that have more than one word — like </span><span class="s2">Trebuchet MS</span><span class="s1"> — need to be surrounded by quotes, for example </span><span class="s2">"Trebuchet MS"</span><span class="s1">.</span></p>
</body>
</html>

The outcome is just:

<p><span class="s1">Font names that have more than one word — like </span><span class="s2">Trebuchet MS</span><span class="s1"> — need to be surrounded by quotes, for example </span><span class="s2">"Trebuchet MS"</span><span class="s1">.</span></p>

On Tuesday, December 28, 2021 at 11:26:40 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
If you don't specify an input format, pandoc assumes markdown input, and while markdown allows literal inclusions of HTML elements, it apparently doesn't allow DOCTYPE declarations, so it does not consider that to be HTML, and translates the angle brackets into character entities.
$ echo '<!DOCTYPE html><ol><li>Bogus</li></ol>' | pandoc -t html
&lt;!DOCTYPE html&gt;
<ol>
<li>
Bogus
</li>
</ol>
However, if you add "-r html" everything is fine:
$ echo '<!DOCTYPE html><ol><li>Bogus</li></ol>' | pandoc -r html -t html
<ol>
<li>Bogus</li>
</ol>

Thank you for your assistance! Indeed, I misread the situation, though the outcome is still strange. The HTML I am starting with in my clipboard is a complete document with a doctype declaration. The first line is:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

Pandoc (pandoc -t html+smart) converts the angle brackets into HTML entity names:

&lt;!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”&gt;

Later on in my process, the content gets converted to RTF using textutil, which removes doctype declarations but retains the line above, converting the entity names back into angle brackets—which is how I got the idea that Pandoc had put it there.

I am not sure why my Pandoc command converts the angle brackets in that first line—it leaves the other angle brackets in the document alone—but I can just remove that line from the clipboard text before processing it with Pandoc, so no problem.
On Tuesday, December 28, 2021 at 10:48:46 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
When standalone is not specified, pandoc typically outputs fragments rather than a complete document.  This is convenient for the case where you are processing multiple fragments into one document.  (This happens in HTML output but also in other output; groff -ms, ConTeXt, LaTeX.)  So normal HTML output I see when I don't specify standalone does not include the doctype.
$ echo '* Bogus' | pandoc -r rst -w html
<ul>
<li>Bogus</li>
</ul>
This is with pandoc 2.16.2, installed with homebrew.


On Tue, Dec 28, 2021 at 9:33 AM Joseph Reagle <josep...-T1oY19WcHSwdnm+yROfE0A@public.gmane.org> wrote:
The doctype declaration is a standard HTML feature and declares the version of the HTML. Pandoc, especially in `--standalone` mode includes these at the start of an HTML document.

I'm confused, however. You haven't specified standalone mode. (And why would you want them removed in any case?) And the behavior you are describing doesn't correspond to recent versions -- I'm using 2.16.2. I'm not sure when/if pandoc last used HTML4.01 strict.

In any case, you could create your own HTML template, without a doctype declaration.

https://pandoc.org/MANUAL.html#templates

On 21-12-27 15:04, phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org wrote:
> I am using Pandoc to convert dumb quotes to smart quotes in HTML. The HTML is on my MacOS clipboard:
>
> pbpaste | pandoc -t html+smart | pbcopy
>
> The output begins with
>
> <!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>
>
> and a blank line.
>
> Is it possible to turn this off?

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e8eac3cc-feb6-e3af-dc9d-d3fe0b964925%40reagle.org.


--

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.


--

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6ae1c100-a3f1-4c6c-b763-3c1f2ace6dbfn%40googlegroups.com.