public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: "T. Kurt Bond" <tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Turn off headers for Mac OS clipboard content output in HTML?
Date: Tue, 28 Dec 2021 11:26:26 -0500	[thread overview]
Message-ID: <CAN1EhV-+rH3p-Oj113nxCm=Sc8M8hKk1Rjci-sXoHMOYHC6CyA@mail.gmail.com> (raw)
In-Reply-To: <60674d49-1a0d-485d-ac2f-ae6a8283dde9n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 5358 bytes --]

If you don't specify an input format, pandoc assumes markdown input, and
while markdown allows literal inclusions of HTML elements, it apparently
doesn't allow DOCTYPE declarations, so it does not consider that to be
HTML, and translates the angle brackets into character entities.

$ echo '<!DOCTYPE html><ol><li>Bogus</li></ol>' | pandoc -t html
&lt;!DOCTYPE html&gt;
<ol>
<li>
Bogus
</li>
</ol>

However, if you add "-r html" everything is fine:

$ echo '<!DOCTYPE html><ol><li>Bogus</li></ol>' | pandoc -r html -t html
<ol>
<li>Bogus</li>
</ol>




On Tue, Dec 28, 2021 at 11:19 AM philmac-97jfqw80gc6171pxa8y+qA@public.gmane.org <philmac-97jfqw80gc6171pxa8y+qA@public.gmane.org>
wrote:

> Thank you for your assistance! Indeed, I misread the situation, though the
> outcome is still strange. The HTML I am starting with in my clipboard is a
> complete document with a doctype declaration. The first line is:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "
> http://www.w3.org/TR/html4/strict.dtd">
>
> Pandoc (pandoc -t html+smart) converts the angle brackets into HTML
> entity names:
>
> &lt;!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “
> http://www.w3.org/TR/html4/strict.dtd”&gt;
>
> Later on in my process, the content gets converted to RTF using textutil,
> which removes doctype declarations but retains the line above, converting
> the entity names back into angle brackets—which is how I got the idea that
> Pandoc had put it there.
>
> I am not sure why my Pandoc command converts the angle brackets in that
> first line—it leaves the other angle brackets in the document alone—but I
> can just remove that line from the clipboard text before processing it with
> Pandoc, so no problem.
> On Tuesday, December 28, 2021 at 10:48:46 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
> wrote:
>
>> When standalone is not specified, pandoc typically outputs fragments
>> rather than a complete document.  This is convenient for the case where you
>> are processing multiple fragments into one document.  (This happens in HTML
>> output but also in other output; groff -ms, ConTeXt, LaTeX.)  So normal
>> HTML output I see when I don't specify standalone does *not* include the
>> doctype.
>>
>> $ echo '* Bogus' | pandoc -r rst -w html
>> <ul>
>> <li>Bogus</li>
>> </ul>
>>
>> This is with pandoc 2.16.2, installed with homebrew.
>>
>>
>> On Tue, Dec 28, 2021 at 9:33 AM Joseph Reagle <josep...-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
>> wrote:
>>
>>> The doctype declaration is a standard HTML feature and declares the
>>> version of the HTML. Pandoc, especially in `--standalone` mode includes
>>> these at the start of an HTML document.
>>>
>>> I'm confused, however. You haven't specified standalone mode. (And why
>>> would you want them removed in any case?) And the behavior you are
>>> describing doesn't correspond to recent versions -- I'm using 2.16.2. I'm
>>> not sure when/if pandoc last used HTML4.01 strict.
>>>
>>> In any case, you could create your own HTML template, without a doctype
>>> declaration.
>>>
>>> https://pandoc.org/MANUAL.html#templates
>>>
>>> On 21-12-27 15:04, phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org wrote:
>>> > I am using Pandoc to convert dumb quotes to smart quotes in HTML. The
>>> HTML is on my MacOS clipboard:
>>> >
>>> > pbpaste | pandoc -t html+smart | pbcopy
>>> >
>>> > The output begins with
>>> >
>>> > <!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “
>>> http://www.w3.org/TR/html4/strict.dtd”>
>>> >
>>> > and a blank line.
>>> >
>>> > Is it possible to turn this off?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/e8eac3cc-feb6-e3af-dc9d-d3fe0b964925%40reagle.org
>>> .
>>>
>>
>>
>> --
>> T. Kurt Bond, tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, https://tkurtbond.github.io
>>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/60674d49-1a0d-485d-ac2f-ae6a8283dde9n%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/60674d49-1a0d-485d-ac2f-ae6a8283dde9n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
T. Kurt Bond, tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, https://tkurtbond.github.io

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAN1EhV-%2BrH3p-Oj113nxCm%3DSc8M8hKk1Rjci-sXoHMOYHC6CyA%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 9078 bytes --]

  parent reply	other threads:[~2021-12-28 16:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-27 20:04 philmac-97jfqw80gc6171pxa8y+qA
     [not found] ` <9ac6c67a-8aba-4a19-bde0-65e37340c5d6n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-28 14:33   ` Joseph Reagle
     [not found]     ` <e8eac3cc-feb6-e3af-dc9d-d3fe0b964925-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2021-12-28 15:48       ` T. Kurt Bond
     [not found]         ` <CAN1EhV-RgQttr_0-LNmgHG-aMQ9L2Wre-_Ytz4PnghSF4S_8kw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-12-28 16:19           ` philmac-97jfqw80gc6171pxa8y+qA
     [not found]             ` <60674d49-1a0d-485d-ac2f-ae6a8283dde9n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-28 16:24               ` jeremy theler
2021-12-28 16:26               ` T. Kurt Bond [this message]
     [not found]                 ` <CAN1EhV-+rH3p-Oj113nxCm=Sc8M8hKk1Rjci-sXoHMOYHC6CyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-12-28 16:32                   ` philmac-97jfqw80gc6171pxa8y+qA
     [not found]                     ` <6ae1c100-a3f1-4c6c-b763-3c1f2ace6dbfn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-28 16:38                       ` T. Kurt Bond
2021-12-29 19:23                   ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN1EhV-+rH3p-Oj113nxCm=Sc8M8hKk1Rjci-sXoHMOYHC6CyA@mail.gmail.com' \
    --to=tkurtbond-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).