From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29857 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "T. Kurt Bond" Newsgroups: gmane.text.pandoc Subject: Re: Turn off headers for Mac OS clipboard content output in HTML? Date: Tue, 28 Dec 2021 11:38:00 -0500 Message-ID: References: <9ac6c67a-8aba-4a19-bde0-65e37340c5d6n@googlegroups.com> <60674d49-1a0d-485d-ac2f-ae6a8283dde9n@googlegroups.com> <6ae1c100-a3f1-4c6c-b763-3c1f2ace6dbfn@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000066899405d437729f" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36388"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDJYFKFHUYHRB5H2VSHAMGQEKUVBQ3Q-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Dec 28 17:38:14 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qt1-f191.google.com ([209.85.160.191]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1n2FUM-0009HW-4r for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 28 Dec 2021 17:38:14 +0100 Original-Received: by mail-qt1-f191.google.com with SMTP id e30-20020ac8011e000000b002c4bb72a062sf8045463qtg.15 for ; Tue, 28 Dec 2021 08:38:14 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1640709493; cv=pass; d=google.com; s=arc-20160816; b=qKQZCZ25xT9/gutOxE3eaBGQiYBSEphbKL8lBe6G5oZS4NBRlmqxkNUCpwsNa9nqrV HggAE3fRMdysAUB23ZXlGo00WELjpcGzh4XAdo+juzv6SE+d/6ZFylmgbq0cCtC9xi+C Kb1Ls4ZHYI5wldTCDreQ1cLKY62v+kT2l6j7cP7/AkSHfYasBnEMVwTPoL/Lx9qnm1fG pzXy/pY8pBUEABwdEN4sLA4AWLdiGGuXdAGwgn3Tn0ZtMxyJKLGFABRN1XHDAmMLgz7f ANG41uEsF4GitdkFcwPKWfQHg66fMK1YHuCu8r+K198h6S64VWgGS9d48ww8INbPr9rl gzYQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:to:subject:message-id:date :from:in-reply-to:references:mime-version:sender:dkim-signature :dkim-signature; bh=RSvuOyUrMtblDVp62HtCyr9pUJZrBuo9iT+AW48DDCo=; b=Idu672+VXJ0Tsx/2FZV+mPlDK1Ko5HU0UcqKgpdCEiohXT8X96KcSFPONWafKFv3TC QdHv7lYU31t4u8KFrdBKA51GLtvsDBHS5sM3x/Ku+x8JBtNekPvu7bIoxE1OmEXuubVr T7ZsbvGQcI2qgM2tQJgm3kiSAwA4ApT3ZytOyGczejxMHeJ9yrOFhLe/+0n/9oQfIz7i Nr8uhnP/l/EFKYrn43emMW+HYm5LIejwosVDBN+ytXTU15auurMpAULKszcn7kdlQB1z ajx8hTqAPQinI88QAnhPCEZB7ZzSEnLkDV5h6DnTJ1FFnXzKqaq9Iy0ZZD9U1LT9bqJ4 lyKA== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=CJap2wb0; spf=pass (google.com: domain of tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::934 as permitted sender) smtp.mailfrom=tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:mime-version:references:in-reply-to:from:date:message-id :subject:to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=RSvuOyUrMtblDVp62HtCyr9pUJZrBuo9iT+AW48DDCo=; b=p+FM6/ej6JLcNB4WO38tix4ehcjnCbakzJiV9TJQR0hpSYCu4ZLrvb9p1B6AEpMKN6 45J8SZ0cpNYOOpBYXxQqOAmQIM1L3F0LznTgMCIM/nl4j8SUaOt0UItwXZRYBtZgNrLh vH0sDl/IkZUvtJ2QAlCGkOuvkPWVyEdQwj+b6j4i3fl9vIWoJGpBrZZi06wA6RXs26wD uoTrsnwhqFsrRt3Kk/l7d5r5/5q5eE3uyY+TnyCdkO/Iz2lQI2cFpB8YcMxZuRVBZPZd iQ53gKQTBDtgIT3C59YBTYBdKtRssD7hBnZtE/ggH8dmWyY4CalrldfKGxfO16/AREYN 4gaw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=RSvuOyUrMtblDVp62HtCyr9pUJZrBuo9iT+AW48DDCo=; b=LRiTgWZt5l/c0fCFCHkvv4LU6sFMTzVfNlli0InXlgBAL51nX+jLVERaZyxC3+JaQb cmuZTC0663+oeDrazzv8/fGNt8DwljfCA27dtcOzIdDdgUv4xeOMp0MLniWT65oyQ0JV bx7Hp1SlLvBO5zYrwBxZQulDkWpkk1z0xQzcXIgpGJTr6EgV/rVpK1yruEvV3HacmAXy D/doyPXCp3BRE6Y1DuHRo16dNyJHTQAnEtqf7lLy42xBpl6ifidi5ZX/GYfAErrtMb1o rInkIzl1bgriDdYefZUQCtLKFwisOIOE3eh811jVdNZNXdw0Te7M80QeAHKSVnrCRUgk s4kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:mime-version:references:in-reply-to:from :date:message-id:subject:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=RSvuOyUrMtblDVp62HtCyr9pUJZrBuo9iT+AW48DDCo=; b=Bi1AAjaVCTFGY6gax8gJ6j8UnYjVCm664nSfs3iH8c/3bp9gyoQ9b/sqQsj0wYQ9XN S46J7Xypn9l2RAkdo75Dxaq1doanI4ANOz3Yccl2CJRzGqk9dUlCfIQr7wdP/Q0XgLVM mblT7HjoFY60u5bpEsM15I1sRa/KMTLkhewJEftTuvXSmcho8aMHzYUiIxE0yDNerHaA /yMg+d+aSUADWMorzse/YdkVlw/BcrdixyuE2R7+7dtODWqo72HxwZMVPyy2wnOWGkZN 1wEqG9lBSTZvkqTV85GHyV2k4+Znuzu8RMtmQIW128Z8zQwN1XGZJG2vxzuH/s6y8Mub 8Irw== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM531i0nuAlcGjnPXeg9f8mPz6sYxxTiT9g/ARE73B2fBZ9HpQpjyJ MFoAExBrs3HbO2Pu5VEE+2Q= X-Google-Smtp-Source: ABdhPJzlvkkxszK6Ogc+guoRiEW8ToSEE1iW01WL43Iv+/hsJMRwAdRj87CKeK/Pp4EVGE327X2nbg== X-Received: by 2002:a05:620a:1226:: with SMTP id v6mr15534880qkj.240.1640709493190; Tue, 28 Dec 2021 08:38:13 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:622a:304:: with SMTP id q4ls9920217qtw.8.gmail; Tue, 28 Dec 2021 08:38:11 -0800 (PST) X-Received: by 2002:ac8:5e46:: with SMTP id i6mr19194343qtx.7.1640709491661; Tue, 28 Dec 2021 08:38:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1640709491; cv=none; d=google.com; s=arc-20160816; b=H4461Eo8NwbuqVsmHnQMdJD7LmL0ioctYurj9mECSwbrQH7AUHPy6yD+mUR6mhbeVp KcelpNEuFkpQVe2i2PreGU2+6ve/JkaQdx0RVyf8PwofRmoKgTeXT2oAzgEMatCoIJYl ri0dhZGD+wWJjVXIyJd4sMeoNJgU4zmXJgd4ZLKilpzOYeHR5pD+DskrF4bZsVC2hdUK mPl2enSvwdZUf7pETFVLZKPj1ZfKwt3L6IE2iUpALtIvnHboYsjjkwGlZ8+jA+WEBV3C 0Vn2M+9jyyx8y7nNynogww67dw1vmcA0EqZfH80Q4vhevXT5rSKfYCazmdsPZT9a0euJ r/Pw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=kdKnRmL9wch3sPgoAPleOo4e61Xky1qeNGFx3/YeRYM=; b=tFbgd48B61mQfc6duKR4AXLgq9WnAo0dkf54yxwI8sRnuvgTeIpsrfj24L/L4HhJdX aSZeH4zStlfy+oFF41q8EbmXIJndflrtIHCu6yBH3sLfSdaqEvX6QoIA5Pt0HBToH0D2 oAcJzZLXb7f6E3cVnUsNEdQRDHDjBxOsgodCC83/+AuUrxRABtQCfhZkX7CL3KsaPM8P SFZrW6XU2X1oXIkDa8MmJy9AENBzdCm5/mCTGzZCg61ThqcxABZ9JiHz37XIfqLnT1DL lPn2jWYXmyFiiM7YxxT/2MNhB5CS6hZJwpWGk0FE3p5nQlrcyLc8v00grxmn8qkJtutK HEtg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=CJap2wb0; spf=pass (google.com: domain of tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::934 as permitted sender) smtp.mailfrom=tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Original-Received: from mail-ua1-x934.google.com (mail-ua1-x934.google.com. [2607:f8b0:4864:20::934]) by gmr-mx.google.com with ESMTPS id u7si1460963qki.5.2021.12.28.08.38.11 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 28 Dec 2021 08:38:11 -0800 (PST) Received-SPF: pass (google.com: domain of tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::934 as permitted sender) client-ip=2607:f8b0:4864:20::934; Original-Received: by mail-ua1-x934.google.com with SMTP id v12so21124926uar.7 for ; Tue, 28 Dec 2021 08:38:11 -0800 (PST) X-Received: by 2002:a05:6102:242a:: with SMTP id l10mr6736952vsi.26.1640709491164; Tue, 28 Dec 2021 08:38:11 -0800 (PST) In-Reply-To: <6ae1c100-a3f1-4c6c-b763-3c1f2ace6dbfn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=CJap2wb0; spf=pass (google.com: domain of tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::934 as permitted sender) smtp.mailfrom=tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29857 Archived-At: --00000000000066899405d437729f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable That makes sense; remember, without -s (standalone) pandoc outputs fragments. I suspect what you need to do is specify -s while also specifying an HTML template that includes the doctype and head information that you want. If all you care about is the CSS, you might try specifying it with --css. On Tue, Dec 28, 2021 at 11:32 AM philmac-97jfqw80gc6171pxa8y+qA@public.gmane.org wrote: > The trouble with using -r html or -f html is that this strips out the > element, so I lose the formatting. > > That is, if I apply pandoc -r html -t html+smart to this: > > http://www.w3.org/TR/html4/strict.dtd"> > > > > > > > > > > >

Font names that have more than one wor= d =E2=80=94 > like Trebuchet MS =E2= =80=94 need to > be surrounded by quotes, for example "Trebuchet > MS".

> > > > The outcome is just: > >

Font names that have more than one word =E2=80=94 l= ike > Trebuchet MS =E2=80= =94 need to be > surrounded by quotes, for example "Trebuchet > MS".

> > On Tuesday, December 28, 2021 at 11:26:40 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org > wrote: > >> If you don't specify an input format, pandoc assumes markdown input, and >> while markdown allows literal inclusions of HTML elements, it apparently >> doesn't allow DOCTYPE declarations, so it does not consider that to be >> HTML, and translates the angle brackets into character entities. >> >> $ echo '
  1. Bogus
' | pandoc -t html >> <!DOCTYPE html> >>
    >>
  1. >> Bogus >>
  2. >>
>> >> However, if you add "-r html" everything is fine: >> >> $ echo '
  1. Bogus
' | pandoc -r html -t html >>
    >>
  1. Bogus
  2. >>
>> >> >> >> >> On Tue, Dec 28, 2021 at 11:19 AM phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org >> wrote: >> >>> Thank you for your assistance! Indeed, I misread the situation, though >>> the outcome is still strange. The HTML I am starting with in my clipboa= rd >>> is a complete document with a doctype declaration. The first line is: >>> >>> >> http://www.w3.org/TR/html4/strict.dtd"> >>> >>> Pandoc (pandoc -t html+smart) converts the angle brackets into HTML >>> entity names: >>> >>> <!DOCTYPE html PUBLIC =E2=80=9C-//W3C//DTD HTML 4.01//EN=E2=80=9D = =E2=80=9C >>> http://www.w3.org/TR/html4/strict.dtd=E2=80=9D> >>> >>> Later on in my process, the content gets converted to RTF using >>> textutil, which removes doctype declarations but retains the line above= , >>> converting the entity names back into angle brackets=E2=80=94which is h= ow I got the >>> idea that Pandoc had put it there. >>> >>> I am not sure why my Pandoc command converts the angle brackets in that >>> first line=E2=80=94it leaves the other angle brackets in the document a= lone=E2=80=94but I >>> can just remove that line from the clipboard text before processing it = with >>> Pandoc, so no problem. >>> On Tuesday, December 28, 2021 at 10:48:46 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org >>> wrote: >>> >>>> When standalone is not specified, pandoc typically outputs fragments >>>> rather than a complete document. This is convenient for the case wher= e you >>>> are processing multiple fragments into one document. (This happens in= HTML >>>> output but also in other output; groff -ms, ConTeXt, LaTeX.) So norma= l >>>> HTML output I see when I don't specify standalone does *not* include >>>> the doctype. >>>> >>>> $ echo '* Bogus' | pandoc -r rst -w html >>>>
    >>>>
  • Bogus
  • >>>>
>>>> >>>> This is with pandoc 2.16.2, installed with homebrew. >>>> >>>> >>>> On Tue, Dec 28, 2021 at 9:33 AM Joseph Reagle >>>> wrote: >>>> >>>>> The doctype declaration is a standard HTML feature and declares the >>>>> version of the HTML. Pandoc, especially in `--standalone` mode includ= es >>>>> these at the start of an HTML document. >>>>> >>>>> I'm confused, however. You haven't specified standalone mode. (And wh= y >>>>> would you want them removed in any case?) And the behavior you are >>>>> describing doesn't correspond to recent versions -- I'm using 2.16.2.= I'm >>>>> not sure when/if pandoc last used HTML4.01 strict. >>>>> >>>>> In any case, you could create your own HTML template, without a >>>>> doctype declaration. >>>>> >>>>> https://pandoc.org/MANUAL.html#templates >>>>> >>>>> On 21-12-27 15:04, phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org wrote: >>>>> > I am using Pandoc to convert dumb quotes to smart quotes in HTML. >>>>> The HTML is on my MacOS clipboard: >>>>> > >>>>> > pbpaste | pandoc -t html+smart | pbcopy >>>>> > >>>>> > The output begins with >>>>> > >>>>> > >>>> http://www.w3.org/TR/html4/strict.dtd=E2=80=9D> >>>>> > >>>>> > and a blank line. >>>>> > >>>>> > Is it possible to turn this off? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "pandoc-discuss" group. >>>>> To unsubscribe from this group and stop receiving emails from it, sen= d >>>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/pandoc-discuss/e8eac3cc-feb6-e3af-d= c9d-d3fe0b964925%40reagle.org >>>>> . >>>>> >>>> >>>> >>>> -- >>>> T. Kurt Bond, tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, https://tkurtbond.github.io >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/60674d49-1a0d-485d-ac2= f-ae6a8283dde9n%40googlegroups.com >>> >>> . >>> >> >> >> -- >> T. Kurt Bond, tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, https://tkurtbond.github.io >> > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/6ae1c100-a3f1-4c6c-b763-= 3c1f2ace6dbfn%40googlegroups.com > > . > --=20 T. Kurt Bond, tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, https://tkurtbond.github.io --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CAN1EhV841GBsc-YMPiJmAdujawjQ9nRAuk7DnFfVrGzpsMgkgg%40mail.g= mail.com. --00000000000066899405d437729f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
That makes sense; remember, without -s (standalone) pandoc= outputs fragments.=C2=A0 I suspect what you need to do is specify -s while= also specifying an HTML template that includes the doctype and head inform= ation that you want. If all you care about is the CSS, you might try specif= ying it with --css.

The trouble with using -f html is that this = strips out the <head> element, so I lose the formatting.

That = is, if I apply pandoc -r html -t html+smart to this:

<!DOCTYPE html PUBLIC &quo= t;-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd&= quot;>
<html>
<head>
<meta http-equiv=3D"= Content-Type" content=3D"text/html; charset=3Dutf-8">
= <meta http-equiv=3D"Content-Style-Type" content=3D"text= /css">
<title></title>
<meta name=3D"= Generator" content=3D"Cocoa HTML Writer">
<meta n= ame=3D"CocoaVersion" content=3D"2113">
<styl= e type=3D"text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.= 0px; font: 16.0px Arial; color: #151515; -webkit-text-stroke: #151515; back= ground-color: #d5e4ff}
span.s1 {font-kerning: none}
span.s2 {= font: 16.0px Courier; font-kerning: none; background-color: #f1f1f1}
&= lt;/style>
</head>
<body>
<p class=3D"p1&qu= ot;><span class=3D"s1">Font names that have more than on= e word =E2=80=94 like </span><span class=3D"s2">Trebu= chet MS</span><span class=3D"s1"> =E2=80=94 need to b= e surrounded by quotes, for example </span><span class=3D"s2&= quot;>"Trebuchet MS"</span><span class=3D"s1&quo= t;>.</span></p>
</body>
</html>

The outcome is just:

<p><spa= n class=3D"s1">Font names that have more than one word =E2=80= =94 like </span><span class=3D"s2">Trebuchet MS</s= pan><span class=3D"s1"> =E2=80=94 need to be surrounded = by quotes, for example </span><span class=3D"s2">&quo= t;Trebuchet MS"</span><span class=3D"s1">.</s= pan></p>

On Tues= day, December 28, 2021 at 11:26:40 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
If y= ou don't specify an input format, pandoc assumes markdown input, and wh= ile markdown allows literal inclusions of HTML elements, it apparently does= n't allow DOCTYPE declarations, so it does not consider that to be HTML= , and translates the angle brackets into character entities.
$ echo '<!DOCTYPE html><ol><li>Bogus</li= ></ol>' | pandoc -t html
&lt;!DOCTYPE html&gt;
<ol>
<li>
Bogus
</li>
</ol>
However, if you add "-r html" everyth= ing is fine:
$ echo '<!DOCTYPE html><o= l><li>Bogus</li></ol>' | pandoc -r html -t html
<ol>
<li>Bogus</li>
</ol>

Thank you for your assistance! Indeed, I misread the s= ituation, though the outcome is still strange. The HTML I am starting with = in my clipboard is a complete document with a doctype declaration. The firs= t line is:

<!DOCTYPE html PUBLIC "= ;-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4= /strict.dtd">

Pandoc (= pandoc -t html+smart) converts the angle brackets into HTML entity n= ames:

&lt;!DOCTYPE html PUBLIC =E2=80= =9C-//W3C//DTD HTML 4.01//EN=E2=80=9D =E2=80=9Chttp://www.w3.org/= TR/html4/strict.dtd=E2=80=9D&gt;

Later on in my proce= ss, the content gets converted to RTF using textutil, which removes doctype= declarations but retains the line above, converting the entity names back = into angle brackets=E2=80=94which is how I got the idea that Pandoc had put= it there.

I am not sure why my Pandoc command converts the angle br= ackets in that first line=E2=80=94it leaves the other angle brackets in the= document alone=E2=80=94but I can just remove that line from the clipboard = text before processing it with Pandoc, so no problem.
On Tuesday, December 28, 2021 a= t 10:48:46 AM UTC-5 tkur...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
When = standalone is not specified, pandoc typically outputs fragments rather than= a complete document.=C2=A0 This is convenient for the case where you are p= rocessing multiple fragments into one document.=C2=A0 (This happens in HTML= output but also in other output; groff -ms, ConTeXt, LaTeX.)=C2=A0 So norm= al HTML output I see when I don't specify standalone does not=C2= =A0include the doctype.
$ echo '* Bogus' | = pandoc -r rst -w html
<ul><= /font>
<li>Bogus</li>=
</ul>
This is with=C2=A0pandoc 2.16.2, installed with=C2=A0homebrew.


On Tue, Dec 28, 2021= at 9:33 AM Joseph Reagle <josep...-T1oY19WcHSwdnm+yROfE0A@public.gmane.org&g= t; wrote:
The doctype declaration is a standard HTML featu= re and declares the version of the HTML. Pandoc, especially in `--standalon= e` mode includes these at the start of an HTML document.

I'm confused, however. You haven't specified standalone mode. (And = why would you want them removed in any case?) And the behavior you are desc= ribing doesn't correspond to recent versions -- I'm using 2.16.2. I= 'm not sure when/if pandoc last used HTML4.01 strict.

In any case, you could create your own HTML template, without a doctype dec= laration.

https://pandoc.org/MANUAL.html#templates

On 21-12-27 15:04, phi...-97jfqw80gc6171pxa8y+qA@public.gmane.org wrote:
> I am using Pandoc to convert dumb quotes to smart quotes in HTML. The = HTML is on my MacOS clipboard:
>
> pbpaste | pandoc -t html+smart | pbcopy
>
> The output begins with
>
> <!DOCTYPE html PUBLIC =E2=80=9C-//W3C//DTD HTML 4.01//EN=E2=80=9D = =E2=80=9Chttp://www.w3.org/TR/html4/strict.dtd=E2= =80=9D>
>
> and a blank line.
>
> Is it possible to turn this off?

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/e8eac3cc-feb6-e3af-dc9d-d3fe0b964925%40reagle.org= .


--

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https= ://groups.google.com/d/msgid/pandoc-discuss/6ae1c100-a3f1-4c6c-b763-3c1f2ac= e6dbfn%40googlegroups.com.


--

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/CAN1EhV841GBsc-YMPiJmAdujawjQ9nRAuk7DnFfV= rGzpsMgkgg%40mail.gmail.com.
--00000000000066899405d437729f--