From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29856 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "philmac-97jfqw80gc6171pxa8y+qA@public.gmane.org" Newsgroups: gmane.text.pandoc Subject: Re: Turn off headers for Mac OS clipboard content output in HTML? Date: Tue, 28 Dec 2021 08:32:43 -0800 (PST) Message-ID: <6ae1c100-a3f1-4c6c-b763-3c1f2ace6dbfn@googlegroups.com> References: <9ac6c67a-8aba-4a19-bde0-65e37340c5d6n@googlegroups.com> <60674d49-1a0d-485d-ac2f-ae6a8283dde9n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_2188_902771047.1640709163769" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9192"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCNPLQHPYMLBBLHYVSHAMGQEAWUZDUI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Dec 28 17:32:48 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oi1-f190.google.com ([209.85.167.190]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1n2FP4-0002B2-RL for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 28 Dec 2021 17:32:46 +0100 Original-Received: by mail-oi1-f190.google.com with SMTP id bd7-20020a056808220700b002bd5095a720sf11982042oib.10 for ; Tue, 28 Dec 2021 08:32:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=ZYbezqdvXN2u3Sey2NsTToAeajNVZNn3PserxAZyHzE=; b=H3eFaQTz2K8Nqpk/4lZGkYZy8zrjbaQlCxG3/IwJo1EvZl3u97b5Q5yrMeG8ceJZPG VRTLF3dZR6mJFk9Am3NwcNndRAqz4azwE0iRlfjIbCSueADR4JiurgIKRUAB2mTm24FW dqapSEwZ/ZKfhR1pP9bfTblu3YBUpW/2l1zdLJl6n90/WWZDRxCgajJNPE2ztLG6hAm7 JYPwx+sQ66te9ooicmfE3BxcnkdVXD9F4/i738DZm0VilNwAWUoL7VMCY85Y7nEbIaSB OlEi2Z65fS0elZK3mttRG5HpEZGEJE3KGrZaJPdG3ikHiyHDg546++qWbRXl4ce5zlLV Y5zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=ZYbezqdvXN2u3Sey2NsTToAeajNVZNn3PserxAZyHzE=; b=uQhrJdYNwi7/mLmb74b36B8tEFcpnmL66up4iYa+BYOswbf3XlNSdaHBXC8IcEnKs1 JQ1mEzF7Y+j91EPQS0Jgn/hYey35N3gNUl86T9vNAK6I8QxeqssaMHSvX5m+ETX4rZmE m7IMPmxnHhkwdz1IBfHzyxY+Tbn3OPvBXTsnLkYZvZCAmehfAdI9xK0wCa4lCV7Q2aL9 1rWuTeMDlSeoT5vj2x8uNOmY6Tpcc7WftZwSzGO7HTPRJMA5hW1W8VJR/1lwMDMdVgMo UDvRmgv9gyBinaIOlR4GrbNalTZlU5OHCg2Nj86kdwQdSTNuiZO/PK9gXuP4qNEUh86l sZ5Q== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM533Q607a8/MipFh5nvPdSVcINt3qEZ1VDLAyoRt5URwfcbhXDA5w FApAAjd4KmT3vjswZ5biNgk= X-Google-Smtp-Source: ABdhPJwVXtUMEC20ZR8gwbNboFNgQdqNwUh6p70vA3yVEyUxF2LCTXw+otNYkTk5oKUAsUa7aCwCwQ== X-Received: by 2002:a4a:9647:: with SMTP id r7mr13801653ooi.76.1640709165757; Tue, 28 Dec 2021 08:32:45 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6808:18aa:: with SMTP id bi42ls4339363oib.11.gmail; Tue, 28 Dec 2021 08:32:44 -0800 (PST) X-Received: by 2002:aca:b843:: with SMTP id i64mr17239647oif.109.1640709164235; Tue, 28 Dec 2021 08:32:44 -0800 (PST) In-Reply-To: X-Original-Sender: philmac-97jfqw80gc6171pxa8y+qA@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29856 Archived-At: ------=_Part_2188_902771047.1640709163769 Content-Type: multipart/alternative; boundary="----=_Part_2189_1148004984.1640709163769" ------=_Part_2189_1148004984.1640709163769 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The trouble with using -r html or -f html is that this strips out the=20 element, so I lose the formatting. That is, if I apply pandoc -r html -t html+smart to this:

Font names that have more than one word = =E2=80=94=20 like Trebuchet MS =E2= =80=94 need to=20 be surrounded by quotes, for example "Trebuchet= =20 MS".

The outcome is just:

Font names that have more than one word =E2=80=94 lik= e=20 Trebuchet MS =E2=80=94 = need to be=20 surrounded by quotes, for example "Trebuchet=20 MS".

On Tuesday, December 28, 2021 at 11:26:40 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote: > If you don't specify an input format, pandoc assumes markdown input, and= =20 > while markdown allows literal inclusions of HTML elements, it apparently= =20 > doesn't allow DOCTYPE declarations, so it does not consider that to be=20 > HTML, and translates the angle brackets into character entities. > > $ echo '
  1. Bogus
' | pandoc -t html > <!DOCTYPE html> >
    >
  1. > Bogus >
  2. >
> > However, if you add "-r html" everything is fine: > > $ echo '
  1. Bogus
' | pandoc -r html -t html >
    >
  1. Bogus
  2. >
> > > > > On Tue, Dec 28, 2021 at 11:19 AM phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org = =20 > wrote: > >> Thank you for your assistance! Indeed, I misread the situation, though= =20 >> the outcome is still strange. The HTML I am starting with in my clipboar= d=20 >> is a complete document with a doctype declaration. The first line is: >> >> > http://www.w3.org/TR/html4/strict.dtd"> >> >> Pandoc (pandoc -t html+smart) converts the angle brackets into HTML=20 >> entity names: >> >> <!DOCTYPE html PUBLIC =E2=80=9C-//W3C//DTD HTML 4.01//EN=E2=80=9D =E2= =80=9C >> http://www.w3.org/TR/html4/strict.dtd=E2=80=9D> >> >> Later on in my process, the content gets converted to RTF using textutil= ,=20 >> which removes doctype declarations but retains the line above, convertin= g=20 >> the entity names back into angle brackets=E2=80=94which is how I got the= idea that=20 >> Pandoc had put it there. >> >> I am not sure why my Pandoc command converts the angle brackets in that= =20 >> first line=E2=80=94it leaves the other angle brackets in the document al= one=E2=80=94but I=20 >> can just remove that line from the clipboard text before processing it w= ith=20 >> Pandoc, so no problem. >> On Tuesday, December 28, 2021 at 10:48:46 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org=20 >> wrote: >> >>> When standalone is not specified, pandoc typically outputs fragments=20 >>> rather than a complete document. This is convenient for the case where= you=20 >>> are processing multiple fragments into one document. (This happens in = HTML=20 >>> output but also in other output; groff -ms, ConTeXt, LaTeX.) So normal= =20 >>> HTML output I see when I don't specify standalone does *not* include=20 >>> the doctype. >>> >>> $ echo '* Bogus' | pandoc -r rst -w html >>>
    >>>
  • Bogus
  • >>>
>>> >>> This is with pandoc 2.16.2, installed with homebrew. >>> >>> >>> On Tue, Dec 28, 2021 at 9:33 AM Joseph Reagle =20 >>> wrote: >>> >>>> The doctype declaration is a standard HTML feature and declares the=20 >>>> version of the HTML. Pandoc, especially in `--standalone` mode include= s=20 >>>> these at the start of an HTML document. >>>> >>>> I'm confused, however. You haven't specified standalone mode. (And why= =20 >>>> would you want them removed in any case?) And the behavior you are=20 >>>> describing doesn't correspond to recent versions -- I'm using 2.16.2. = I'm=20 >>>> not sure when/if pandoc last used HTML4.01 strict. >>>> >>>> In any case, you could create your own HTML template, without a doctyp= e=20 >>>> declaration. >>>> >>>> https://pandoc.org/MANUAL.html#templates >>>> >>>> On 21-12-27 15:04, phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org wrote: >>>> > I am using Pandoc to convert dumb quotes to smart quotes in HTML. Th= e=20 >>>> HTML is on my MacOS clipboard: >>>> >=20 >>>> > pbpaste | pandoc -t html+smart | pbcopy >>>> >=20 >>>> > The output begins with >>>> >=20 >>>> > >>> http://www.w3.org/TR/html4/strict.dtd=E2=80=9D> >>>> >=20 >>>> > and a blank line. >>>> >=20 >>>> > Is it possible to turn this off? >>>> >>>> --=20 >>>> You received this message because you are subscribed to the Google=20 >>>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send= =20 >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To view this discussion on the web visit=20 >>>> https://groups.google.com/d/msgid/pandoc-discuss/e8eac3cc-feb6-e3af-dc= 9d-d3fe0b964925%40reagle.org >>>> . >>>> >>> >>> >>> --=20 >>> T. Kurt Bond, tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, https://tkurtbond.github.io >>> >> --=20 >> You received this message because you are subscribed to the Google Group= s=20 >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n=20 >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To view this discussion on the web visit=20 >> https://groups.google.com/d/msgid/pandoc-discuss/60674d49-1a0d-485d-ac2f= -ae6a8283dde9n%40googlegroups.com=20 >> >> . >> > > > --=20 > T. Kurt Bond, tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, https://tkurtbond.github.io > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/6ae1c100-a3f1-4c6c-b763-3c1f2ace6dbfn%40googlegroups.com. ------=_Part_2189_1148004984.1640709163769 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The trouble with using -r html or -f html is that this strips out the <head>= element, so I lose the formatting.

That is, if I apply pandoc -r html -t html+smart to this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "= http://www.w3.org/TR/html4/strict.dtd">
<html>
<head><= br> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Du= tf-8">
<meta http-equiv=3D"Content-Style-Type" content=3D"text/c= ss">
<title></title>
<meta name=3D"Generator" c= ontent=3D"Cocoa HTML Writer">
<meta name=3D"CocoaVersion" conten= t=3D"2113">
<style type=3D"text/css">
p.p1 {margin: 0.= 0px 0.0px 0.0px 0.0px; font: 16.0px Arial; color: #151515; -webkit-text-str= oke: #151515; background-color: #d5e4ff}
span.s1 {font-kerning: none= }
span.s2 {font: 16.0px Courier; font-kerning: none; background-colo= r: #f1f1f1}
</style>
</head>
<body>
<p c= lass=3D"p1"><span class=3D"s1">Font names that have more than one = word =E2=80=94 like </span><span class=3D"s2">Trebuchet MS</= span><span class=3D"s1"> =E2=80=94 need to be surrounded by quotes= , for example </span><span class=3D"s2">"Trebuchet MS"</span= ><span class=3D"s1">.</span></p>
</body>
&= lt;/html>

The outcome is just:

<p><span class=3D"s1">Font names that have more than one= word =E2=80=94 like </span><span class=3D"s2">Trebuchet MS<= /span><span class=3D"s1"> =E2=80=94 need to be surrounded by quote= s, for example </span><span class=3D"s2">"Trebuchet MS"</spa= n><span class=3D"s1">.</span></p>

On Tuesday, December 28, 2021 at 11:26:40 AM= UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
If you don't speci= fy an input format, pandoc assumes markdown input, and while markdown allow= s literal inclusions of HTML elements, it apparently doesn't allow DOCT= YPE declarations, so it does not consider that to be HTML, and translates t= he angle brackets into character entities.
$ echo '&l= t;!DOCTYPE html><ol><li>Bogus</li></ol>' | p= andoc -t html
&lt;!DOCTYPE ht= ml&gt;
<ol>
<li>
Bogus
</li>=
</ol>
However, if you add "-r html" everything is fine:
$ echo '<!DOCTYPE html><ol><li>Bogus</li>= ;</ol>' | pandoc -r html -t html
<ol>
<li>Bo= gus</li>
</ol>=

Thank you for your assistance! Indeed,= I misread the situation, though the outcome is still strange. The HTML I a= m starting with in my clipboard is a complete document with a doctype decla= ration. The first line is:

<!DOCTYPE h= tml PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.d= td">

Pandoc (pandoc -t= html+smart) converts the angle brackets into HTML entity names:
=
&lt;!DOCTYPE html PUBLIC =E2=80=9C-//W3C= //DTD HTML 4.01//EN=E2=80=9D =E2=80=9Chttp://www.w3.org/TR/html4/strict.dtd=E2=80=9D&a= mp;gt;

Later on in my process, the content gets converted to = RTF using textutil, which removes doctype declarations but retains the line= above, converting the entity names back into angle brackets=E2=80=94which = is how I got the idea that Pandoc had put it there.

I am not sure wh= y my Pandoc command converts the angle brackets in that first line=E2=80=94= it leaves the other angle brackets in the document alone=E2=80=94but I can = just remove that line from the clipboard text before processing it with Pan= doc, so no problem.
On Tuesday, December 28, 2021 at 10:48:46 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
When standalone = is not specified, pandoc typically outputs fragments rather than a complete= document.=C2=A0 This is convenient for the case where you are processing m= ultiple fragments into one document.=C2=A0 (This happens in HTML output but= also in other output; groff -ms, ConTeXt, LaTeX.)=C2=A0 So normal HTML out= put I see when I don't specify standalone does not=C2=A0include = the doctype.
$ echo '* Bogus' | pandoc -r r= st -w html
<ul>
<li>Bogus</li>
= </ul>
Thi= s is with=C2=A0pandoc 2.16.2, installed with=C2=A0homebrew.

<= /div>

On Tue, Dec 28, 2021 at 9:33 AM= Joseph Reagle <josep...-T1oY19WcHSwdnm+yROfE0A@public.gmane.org> wrote:
The doctype declaration is a standard HTML feature and decl= ares the version of the HTML. Pandoc, especially in `--standalone` mode inc= ludes these at the start of an HTML document.

I'm confused, however. You haven't specified standalone mode. (And = why would you want them removed in any case?) And the behavior you are desc= ribing doesn't correspond to recent versions -- I'm using 2.16.2. I= 'm not sure when/if pandoc last used HTML4.01 strict.

In any case, you could create your own HTML template, without a doctype dec= laration.

https://pandoc.org/MANUAL.html#templates

On 21-12-27 15:04, phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org wrote:
> I am using Pandoc to convert dumb quotes to smart quotes in HTML. The = HTML is on my MacOS clipboard:
>
> pbpaste | pandoc -t html+smart | pbcopy
>
> The output begins with
>
> <!DOCTYPE html PUBLIC =E2=80=9C-//W3C//DTD HTML 4.01//EN=E2=80=9D = =E2=80=9Chttp://www.w3.org/TR/html4/strict.dtd=E2=80=9D>
>
> and a blank line.
>
> Is it possible to turn this off?

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e8eac3cc-feb6-e= 3af-dc9d-d3fe0b964925%40reagle.org.


--

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus...@googleg= roups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/6ae1c100-a3f1-4c6c-b763-3c1f2ace6dbfn%40googlegroups.= com.
------=_Part_2189_1148004984.1640709163769-- ------=_Part_2188_902771047.1640709163769--