From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32529 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Gwern Branwen Newsgroups: gmane.text.pandoc Subject: Re: html checkbox to markdown Date: Fri, 5 May 2023 11:27:51 -0400 Message-ID: References: <87zg6ian3w.fsf@zeitkraut.de> <0d96eb75-e25a-44b3-880d-94106f0b2cdbn@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000d177e205faf3f000" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31785"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDFJXQMSYMIRBHOB2SRAMGQE6XYNU4I-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri May 05 17:28:32 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ot1-f57.google.com ([209.85.210.57]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1puxMF-00082n-Te for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 05 May 2023 17:28:31 +0200 Original-Received: by mail-ot1-f57.google.com with SMTP id 46e09a7af769-6a5ff93223csf1499051a34.0 for ; Fri, 05 May 2023 08:28:31 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1683300510; cv=pass; d=google.com; s=arc-20160816; b=dZRUhwkzROXgrKsXw14sTCMX3kMDXQFbRPsNiiTysxs9Od8Rbhva/OOS775LYaqSYn vp7+yy/8smxbPagECV2BTXTnBPoQ2B2IZed5Drjxr2FoWtAgJfrMug42yORsxmM+53gG ePkTWPwIlYT/F6CwLvGXD3/1sRPugKLhSOPGtA3Jup2xZGmBrp8f+vFDU71X7Oet/go8 0V+r5uOpH4k0KhdnDCgbGP2YcNVTV96VQi61QgK3fA7WwRlTZR0anTCvh2nqHLx4fzLR u0CdujxntfLgV2fIprfCixDpvaMXgfRYTVW9sONx5rKTwE4JSLJF1KI0CKP0/pUX6lEb H+pA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:to:subject:message-id:date :from:in-reply-to:references:mime-version:sender:dkim-signature; bh=5+yx93bwQApZ6s3rurXbvv7TljyLHFFbc2wt1vEOAkk=; b=WdphAkuMI7njAFVECUgntV12s9nn80xh9crf2X6aQ5liwRzjX0ECaNvuqad0lXXW1l lQFRUOdczb1KowpM/MAVhyMIYQyCI7VbvwDIh6hxTX/bf9dthMjJFwiQpluQ0BgLk1LI FCRO+HViaK1ei2Tx+KKz0gPdoIrIM9QW9InAq4Bsjj4NMJCeXZXIQpdODnHHJtMaMDnr gOWU2fMe8pxj4EYXjuyJQZozc79A24Hop6/o4c5mOFXsFP9cf0Hhq9UX2/D5coeNAfiL p/ncSwNjnp06x/yVKor0olL9hHA9ZIDJDszPDwOrlafUqnH7pKi44f0HEWJz/Azk2H8Q 0owA== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.214.180 as permitted sender) smtp.mailfrom=gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1683300510; x=1685892510; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:to:subject :message-id:date:from:in-reply-to:references:mime-version:sender :from:to:cc:subject:date:message-id:reply-to; bh=5+yx93bwQApZ6s3rurXbvv7TljyLHFFbc2wt1vEOAkk=; b=KGU1MCnkFhddj+5C3RT4gbNUNplW6McTGolRM5wCEdYPAbAMbKodf/WX/RCfKLAOu4 bEF+rePsBcvcxjIM9wdow8iv0eXQIev43pTdQ0CYfIdjZQxhRGzT3yjpe2jzvl424kuX dASGDpmdNOUtvGJCXRyRWnRtP/kNxEtShLmTTAuW0VGvnEf/VCu7bi66/ba6HyYxl8ic R7OhrwvHIxUcr/J8J3w1uQKQEQ4p+YzIihD3Bohyz5vfYbzi9ZboCYhBmuhkigoKbfR6 pf4iwrKBLPZgnmBrkrKsHSockCn+JNO8IBOIeLfooDIv+xOACDXRu/XOKXHQlBLQ+lpv X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683300510; x=1685892510; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:to:subject :message-id:date:from:in-reply-to:references:mime-version :x-beenthere:x-gm-message-state:sender:from:to:cc:subject:date :message-id:reply-to; bh=5+yx93bwQApZ6s3rurXbvv7TljyLHFFbc2wt1vEOAkk=; b=eDQ5GCSEfoiZzB5dnE4zZcqdk4mneUk3ZvED0K9tY3H5J7Qm+1U8hAyL1vtGKjpjhy uQxbfdMv7mXOQouc4ksbeedei3WrO82Qubjaubr7Frck7r8VRDWw0ja+jQ7zY6kDkpaa TzEghWiSqqZ69kTMg1u/iwpC6RdBXN5rVy86TF3RTRL8qHNmkMz/JpAg2RPha6LI9pE/ YXn83UK7p9F6/nWDSajF+AbU+ooSLeSpz6EOGCuN0JHu6GcafDYna2VFGrCiL64Ec1eI Z2f50l Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AC+VfDwNCSbT+PZYcNuHgmzVnjAnmcLr9ifwoEK4JPP0/RqRgmMQGGPN vlkPHPgppuInNIX/ZEJh2IE= X-Google-Smtp-Source: ACHHUZ6PUtz4JpjhUaMRLc+qPPXOR9w7dkyWaM3Je4PLSZ2Xp7/m2Fn2nI8td/wPlme11BVWGUnE6Q== X-Received: by 2002:a9d:560d:0:b0:6a6:38a6:e1ba with SMTP id e13-20020a9d560d000000b006a638a6e1bamr473603oti.5.1683300510757; Fri, 05 May 2023 08:28:30 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6870:6714:b0:187:ff84:efc2 with SMTP id gb20-20020a056870671400b00187ff84efc2ls1252187oab.10.-pod-prod-gmail; Fri, 05 May 2023 08:28:28 -0700 (PDT) X-Received: by 2002:a05:6870:1a89:b0:188:77b:7c5c with SMTP id ef9-20020a0568701a8900b00188077b7c5cmr917583oab.11.1683300508663; Fri, 05 May 2023 08:28:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683300508; cv=none; d=google.com; s=arc-20160816; b=QXxEoddoQoD5Y2N0r7YymSbwGIxO3j5UiQ9S5WJ/msI9cOfGWP/4OQi9R1QEmQ9p5k YEA6AXrk99YudPBCfluKnDZuf6IkcvspEfvKyAp52yzwPvJAsW92UVJBZGnGU4/nvrE4 41ZmUz+xh5h0ALhI7IUgG1ibI7brHXhcixUN9n14OK7V8/LUFj/TH6mPCd6yYP/xeXAR qFnjcY9fJ4bQ0MW5CuurRlCHZ+kXRIR1SqLje3iiNmYdDq4s4Xsk3CWKgwivzq//98MP xDE9vCPIh7n/dql6XjwQa7QE5DORtC6HG0fYuAi4Ix6uDwLGwPo5dgJVXasHRDxxvT0e MTxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:in-reply-to:references:mime-version; bh=FSsRp/FiLrLyv0mkrQo2rdtCXrVsFFn3y3csQ7PPqCs=; b=JvnXezTRY3iqtEHskmn9RGqvaUzISZO1U0dw/kGCArGNDUnSSfmfBYRaTKleowW2xd jE5meuooyg2xzX2+NSfoptCiqZ07vdjec7JDd/1lxNy9xm7TIOGBNZIkvh/THQ6wF772 sX3xJVw0Y513ER7lkd79apFNvsfwTPRDQ79seKsPZRojenciLqCYcVmm9uxisNkcJI29 GvthLTjOCl/Xy6viYn1XzVbZV1XtMLlq/HL3tB1e4kmG39fwmy6Dh+ZbxEFR1Y3hBmUq 4VM5GcBonIq1dGd+mH6DJknKKhpUUc6/mKlMklQ+ebCP1RWL5jV5Tokbd7Y3N8U+HSvb OZCg== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.214.180 as permitted sender) smtp.mailfrom=gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Original-Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com. [209.85.214.180]) by gmr-mx.google.com with ESMTPS id cb9-20020a0568201b8900b0054cc02dd683si87048oob.1.2023.05.05.08.28.28 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 05 May 2023 08:28:28 -0700 (PDT) Received-SPF: pass (google.com: domain of gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.214.180 as permitted sender) client-ip=209.85.214.180; Original-Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1a50cb65c92so13346755ad.0 for ; Fri, 05 May 2023 08:28:28 -0700 (PDT) X-Received: by 2002:a17:903:234c:b0:1a6:ff51:270 with SMTP id c12-20020a170903234c00b001a6ff510270mr2073996plh.29.1683300507808; Fri, 05 May 2023 08:28:27 -0700 (PDT) In-Reply-To: <0d96eb75-e25a-44b3-880d-94106f0b2cdbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.214.180 as permitted sender) smtp.mailfrom=gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32529 Archived-At: --000000000000d177e205faf3f000 Content-Type: text/plain; charset="UTF-8" The Pandoc HTML reader is, perhaps surprisingly, worse for reading HTML than the Markdown reader, which will generally preserve HTML (because Markdown is defined as a superset of HTML). So if you want to read HTML without erasing stuff, you are generally better off specifying the *Markdown* reader. The results can be kinda ugly, but there's no way around it: there is no 'native' Markdown for a checkbox input, so it uses the fallback. Example: $ echo '

' | pandoc -f html -w markdown $ echo '

' | pandoc -f markdown -w markdown ```{=html}

``` ``{=html} ```{=html}

``` The HTML reader can't understand the so it is silently dropped. The Markdown reader treats it as a HTML fragment embedded in Markdown, which is preserved as a literal, and passed through. -- gwern https://gwern.net -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAMwO0gwyAFsVjJFyxJBB18p6innDv0ssH1Dx4NBo3Je5BvuoeQ%40mail.gmail.com. --000000000000d177e205faf3f000 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The Pandoc HTML reader is, perhaps surprisingly, wors= e for reading HTML than the Markdown reader, which will generally preserve = HTML (because Markdown is defined as a superset of HTML). So if you want to= read HTML without erasing stuff, you are generally better off specifying t= he *Markdown* reader. The results can be kinda ugly, but there's no way= around it: there is no 'native' Markdown for a checkbox input, so = it uses the fallback.

Example:

<= /div>
=C2=A0 =C2=A0 $ echo '<p><input type=3D"checkbo= x" /></p>' | pandoc -f html -w markdown
=C2=A0 =C2=A0 = $ echo '<p><input type=3D"checkbox" /></p>&= #39; | pandoc -f markdown -w markdown
=C2=A0 =C2=A0 ```{=3Dhtml}
=C2= =A0 =C2=A0 <p>
=C2=A0 =C2=A0 ```
=C2=A0 =C2=A0 `<input type= =3D"checkbox" />`{=3Dhtml}
=C2=A0 =C2=A0 ```{=3Dhtml}
= =C2=A0 =C2=A0 </p>
=C2=A0 =C2=A0 ```

= The HTML reader can't understand the <input> so it is silently dr= opped. The Markdown reader treats it as a HTML fragment embedded in Markdow= n, which is preserved as a literal, and passed through.

<= /div>--

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/CAMwO0gwyAFsVjJFyxJBB18p6innDv0ssH1Dx4NBo= 3Je5BvuoeQ%40mail.gmail.com.
--000000000000d177e205faf3f000--