The trouble with using -r html or -f html is that this strips out the
element, so I lose the formatting.
That is, if I apply pandoc -r html -t html+smart to this:
Font names that have more than one word —
like Trebuchet MS — need to
be surrounded by quotes, for example "Trebuchet
MS".
The outcome is just:
Font names that have more than one word — like
Trebuchet MS — need to be
surrounded by quotes, for example "Trebuchet
MS".
On Tuesday, December 28, 2021 at 11:26:40 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
> If you don't specify an input format, pandoc assumes markdown input, and
> while markdown allows literal inclusions of HTML elements, it apparently
> doesn't allow DOCTYPE declarations, so it does not consider that to be
> HTML, and translates the angle brackets into character entities.
>
> $ echo '- Bogus
' | pandoc -t html
> <!DOCTYPE html>
>
> -
> Bogus
>
>
>
> However, if you add "-r html" everything is fine:
>
> $ echo '- Bogus
' | pandoc -r html -t html
>
> - Bogus
>
>
>
>
>
> On Tue, Dec 28, 2021 at 11:19 AM phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org
> wrote:
>
>> Thank you for your assistance! Indeed, I misread the situation, though
>> the outcome is still strange. The HTML I am starting with in my clipboard
>> is a complete document with a doctype declaration. The first line is:
>>
>>
>>
>> Pandoc (pandoc -t html+smart) converts the angle brackets into HTML
>> entity names:
>>
>> <!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “
>> http://www.w3.org/TR/html4/strict.dtd”>
>>
>> Later on in my process, the content gets converted to RTF using textutil,
>> which removes doctype declarations but retains the line above, converting
>> the entity names back into angle brackets—which is how I got the idea that
>> Pandoc had put it there.
>>
>> I am not sure why my Pandoc command converts the angle brackets in that
>> first line—it leaves the other angle brackets in the document alone—but I
>> can just remove that line from the clipboard text before processing it with
>> Pandoc, so no problem.
>> On Tuesday, December 28, 2021 at 10:48:46 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
>> wrote:
>>
>>> When standalone is not specified, pandoc typically outputs fragments
>>> rather than a complete document. This is convenient for the case where you
>>> are processing multiple fragments into one document. (This happens in HTML
>>> output but also in other output; groff -ms, ConTeXt, LaTeX.) So normal
>>> HTML output I see when I don't specify standalone does *not* include
>>> the doctype.
>>>
>>> $ echo '* Bogus' | pandoc -r rst -w html
>>>
>>>
>>> This is with pandoc 2.16.2, installed with homebrew.
>>>
>>>
>>> On Tue, Dec 28, 2021 at 9:33 AM Joseph Reagle
>>> wrote:
>>>
>>>> The doctype declaration is a standard HTML feature and declares the
>>>> version of the HTML. Pandoc, especially in `--standalone` mode includes
>>>> these at the start of an HTML document.
>>>>
>>>> I'm confused, however. You haven't specified standalone mode. (And why
>>>> would you want them removed in any case?) And the behavior you are
>>>> describing doesn't correspond to recent versions -- I'm using 2.16.2. I'm
>>>> not sure when/if pandoc last used HTML4.01 strict.
>>>>
>>>> In any case, you could create your own HTML template, without a doctype
>>>> declaration.
>>>>
>>>> https://pandoc.org/MANUAL.html#templates
>>>>
>>>> On 21-12-27 15:04, phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org wrote:
>>>> > I am using Pandoc to convert dumb quotes to smart quotes in HTML. The
>>>> HTML is on my MacOS clipboard:
>>>> >
>>>> > pbpaste | pandoc -t html+smart | pbcopy
>>>> >
>>>> > The output begins with
>>>> >
>>>> > >>> http://www.w3.org/TR/html4/strict.dtd”>
>>>> >
>>>> > and a blank line.
>>>> >
>>>> > Is it possible to turn this off?
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "pandoc-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/pandoc-discuss/e8eac3cc-feb6-e3af-dc9d-d3fe0b964925%40reagle.org
>>>> .
>>>>
>>>
>>>
>>> --
>>> T. Kurt Bond, tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, https://tkurtbond.github.io
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>
> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/60674d49-1a0d-485d-ac2f-ae6a8283dde9n%40googlegroups.com
>>
>> .
>>
>
>
> --
> T. Kurt Bond, tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, https://tkurtbond.github.io
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6ae1c100-a3f1-4c6c-b763-3c1f2ace6dbfn%40googlegroups.com.