From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32504 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Oliver Newsgroups: gmane.text.pandoc Subject: Re: Attribute-less Markdown from web page Html Date: Sat, 29 Apr 2023 08:14:59 +1000 Message-ID: <0B417699-0C93-4DBC-9B09-8C36B6F39B0F@halloleo.hailmail.net> References: <8AD0B607-B556-48CC-83AA-7D0BACD3B8BE@halloleo.hailmail.net> <1E55C9DB-9A8C-4064-9927-2EC8B70076A0@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="10062"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCEMPL7DTEMRB2MKWGRAMGQE6NHUZFI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Apr 29 00:15:10 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oi1-f185.google.com ([209.85.167.185]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1psWMv-0002Pi-PY for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 29 Apr 2023 00:15:09 +0200 Original-Received: by mail-oi1-f185.google.com with SMTP id 5614622812f47-38d6100e2e8sf136653b6e.1 for ; Fri, 28 Apr 2023 15:15:09 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1682720108; cv=pass; d=google.com; s=arc-20160816; b=u11a9yPPl1J/63NKe4MyHWV4UWKjM6hjMPj46m2fX59sBQGLi+pz6PilQCCILYsPFD oZ0kEyP84FQIpn+xBPfptxokp8vKdpAaLdBXG8L7suAWEURBptYqznVAPVZlk8Di9uCZ cFj1/mAGS3cvKge4r7v3/bGSdA3px1h5dSPmSe2IT1Rn07827cRw/DpnEX1EQxsF6INg OfLHFm39lhY6LHIm+pDnZ2dFNAAs3nz7rG4Vm0Ck08w6FxGFgmZzNRa2zPbeQlZeY68u zWs2CWF3GoBhE0aKEOH2p4ZRrtE/GgrBoHnD1u7MZrqR+imvly5n+l21Z4+rNQa4qBK+ 1YtQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :feedback-id:sender:dkim-signature; bh=rCmh7LHrZsUwxciS61tn4v3kvMz0tTnbEMwxs3yypW4=; b=eKA96WryaAptgOWx43zINeqMhSojt/s5JSonstJDt+PqZNG5um+XedSbNdFsB6A6J0 XV4W3rfpL6UFH+Mr0jZ0YrDZGueoH/5CH/e0EShwHNv3gZoMdP9Jkig3C8bc3TtRiqYB D0tVuONRkW/ozj9dM6xDQN7ZsPcAT4XyGcPhHgZ3Pi5yp5MRcOxlT1sPr4U1+46jq3V5 a8EjEjPixqW541zRlx9+djcUbHyUujWmM4qr65ar55xqc9lyxyeu7xfiv11tEZTRSZL5 mpX+zk15rP8cxusDlq7m5x30Qu7ZPJR/7MTS9RMy7e5a8iWuYarkdDnI1Dp5Qxlb1YOA 9n2Q== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@hailmail.net header.s=fm3 header.b=MqaGMlL0; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=DhK8LyX7; spf=pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) smtp.mailfrom=news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hailmail.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1682720108; x=1685312108; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:feedback-id:sender:from:to:cc :subject:date:message-id:reply-to; bh=rCmh7LHrZsUwxciS61tn4v3kvMz0tTnbEMwxs3yypW4=; b=s6uNiCWrVZaEXJvLLDOGQw8t5DKfvbLTVwDvZpYY8+6lGfU8vaO28d48T7dGPlWP83 CqPy6MGXk6hTXRTNyQT8Z0Mus5RLu+DPFY3lRxJ6mlWB4occIw/Kc0KjpcrhaMS0i1s4 QkLTD4fR37WJNtMm/ZZs4Pu+WE2n51u8wQ7ZMQoqck8jDqwype6R0jF+Yt0H1CtKgCUu y5PxFTIp/RKnmGLz3PC4w+RyhHBy4sxqitYME/E+qOFCkvrmJlg/kMZ10iNIU8+xyw2/ q6wmh+7I+dRyUcXAYIalfMsZE X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682720108; x=1685312108; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:feedback-id:x-beenthere :x-gm-message-state:sender:from:to:cc:subject:date:message-id :reply-to; bh=rCmh7LHrZsUwxciS61tn4v3kvMz0tTnbEMwxs3yypW4=; b=YwBvDJMbIziaHYAYHg+9Jhm8SJ+6jMLJg3+rBOGKOcgPk8mtMSBcGbGWQ2hwIaiWRq O6Q+SjLFRaxFW/JNOL91avBC422wJSsK2Y1+tADaXg0XUJ/iXb8QxHgfrPTk1bXqADxY 8QMgC8xqRNPL9P0FKd3SwjuE8bLfXMJ7AkE8ueo++XoR2RsOM8ve/F6x/BZ4xQYzfC8L SQfRuDGTFB/Vtrdmhq5wR0gaObFJfEcwAQys Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AC+VfDyKyXFDypMtV66dSFdu+VhOdgu9vTi9vgOGaxo//zXcNI0+qZgD WMdw139QnheXJgKHZBXqCgU= X-Google-Smtp-Source: ACHHUZ6PbXhErPEcazvRRbNmXRJ0gNAhA2/0OLzHBFFM284950CkwaE2Lwbl92B/fcSoQmF9danxDw== X-Received: by 2002:aca:3c82:0:b0:386:d70b:d67c with SMTP id j124-20020aca3c82000000b00386d70bd67cmr1387670oia.11.1682720108704; Fri, 28 Apr 2023 15:15:08 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aca:efc2:0:b0:38e:26f1:73b9 with SMTP id n185-20020acaefc2000000b0038e26f173b9ls1481202oih.9.-pod-prod-gmail; Fri, 28 Apr 2023 15:15:05 -0700 (PDT) X-Received: by 2002:a05:6808:56:b0:38d:ef40:139d with SMTP id v22-20020a056808005600b0038def40139dmr3299326oic.52.1682720105427; Fri, 28 Apr 2023 15:15:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682720105; cv=none; d=google.com; s=arc-20160816; b=F0ii3Ej/X6bwjr6oqaMMWAtb7MT15492g/puMv5xbfpfpoJqFrIy2UPoBrElwUdBaW m6CH/btONq4/OMzvFxVXTxv/trlbhiOCy+fhmnB0tCr/2aK/JuFXvHlvQAZzCvZnhV/W w4T79FGawHTeXlBLLjtHJGBd0a5dsAcBo1qWksUQ4M78btog1Xy98umbQpEeN3qK/J9Z Ewabyieq4w9KPQjeuO9D+e7UsyxiD+GfAZ9PMq8fZB6C5P2otEOgRHnFEwcLqnmLZHiu 7DFFmxjwGhEGMeJIrJvIsCL9KBu/RlbGEjVU+k0RDmdHnkdw64CSV2e9ps6bzy1cr7iX amHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:feedback-id:dkim-signature :dkim-signature; bh=Kw+VUrS/95YQRYpNg55KLl4vQgQ8CxQIZf10GWWOQWI=; b=w5x2G+/fLSMBGKa7fX4kGCKZVXhlhsf9ICSIeb+Grhbn26g0zSq/5hvmeUxvsQmPxM os2vLljpcb3ZEoNA4dBFi+miRO/HGN3Gies5AQ83znYZYjdgPnZFz2rNXpSFe+GHC0r1 9gXvugtHKJfoDEbQZ2g35OyaRI3Ac/1Tu0PvEh3btPcAAZ/WHHNmOMv7NKp5HcIXUkEK ofhjzcwI3DIfz6E3fGjOoS4Zzl4YGOP4NJIa/s1rfJijYCJjFkyGo8yLpLnwYrbMxE8L V1n/OyS34jbrGp7Q7K0AnFXvn8cupwlskVpdcl4TVcX26MPTsh8IsR4BAlDm60GyF9Un tY2g== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@hailmail.net header.s=fm3 header.b=MqaGMlL0; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=DhK8LyX7; spf=pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) smtp.mailfrom=news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hailmail.net Original-Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com. [66.111.4.27]) by gmr-mx.google.com with ESMTPS id i83-20020acaea56000000b0038e717bf7f5si1240282oih.5.2023.04.28.15.15.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Apr 2023 15:15:05 -0700 (PDT) Received-SPF: pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) client-ip=66.111.4.27; Original-Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id AEBF15C00F5 for ; Fri, 28 Apr 2023 18:15:04 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Fri, 28 Apr 2023 18:15:04 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfeduledgtdekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucgoufhushhpvggtthffohhmrghinhculdegledmne cujfgurhephffvufffoffkjghfgggtgfesthhqmhdtredtjeenucfhrhhomhepqfhlihhv vghruceonhgvfihssehhrghllhholhgvohdrhhgrihhlmhgrihhlrdhnvghtqeenucggtf frrghtthgvrhhnpefgfeekffefhffgteejhfdtgfehkedtjeejtddthfffudejhefhveev vddvgeduueenucffohhmrghinhepphgrnhguohgtrdhorhhgpdhgohhoghhlvgdrtghomh enucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehnvgif sheshhgrlhhlohhlvghordhhrghilhhmrghilhdrnhgvth X-ME-Proxy: Feedback-ID: i02894466:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Fri, 28 Apr 2023 18:15:02 -0400 (EDT) X-Mailer: MailMate (1.14r5926) In-Reply-To: X-Original-Sender: news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@hailmail.net header.s=fm3 header.b=MqaGMlL0; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=DhK8LyX7; spf=pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) smtp.mailfrom=news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hailmail.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32504 Archived-At: Cool. But CommonMark uses these strange empty link reference labels (when t= he link title itself is the link label): [Pandoc Manual][] [Pandoc Manual]: https://pandoc.org/MANUAL.html Is there a way to switch this off? I.e just: [Pandoc Manual] On 29 Apr 2023, at 2:49, John MacFarlane wrote: > You would get different indentation with `-t commonmark`. markdown_strict= follows the '4-space rule'. > > >> On Apr 26, 2023, at 4:59 PM, Oliver wrote: >> >> Thanks John >> >> `-t markdown_strict-raw_html` does the trick for me! >> >> One thing though with markdown_strict is odd: The text of lists is inden= ted to the next 4-space column: >> >> `- list text` >> >> Can I somehow tell the markdown_strict writer to use only _one_ space he= re: >> >> `- list text` >> >> Anyway, thousand time thanks for Pandoc! >> >> >> On 26 Apr 2023, at 15:39, John MacFarlane wrote: >> >>> Turning off -link_attributes should do it, but looks like you tried tha= t. >>> >>> I'd have to look at an example of the input that produces this with the= se settings. >>> >>> If you don't need fancy features, you could also try `-t commonmark` or= `-t markdown_strict`. >>> >>>> On Apr 25, 2023, at 5:32 PM, Oliver wrote= : >>>> >>>> Hi all >>>> >>>> I try to use Pandoc to convert web pages to markdown without all the c= lass clutter like `{.underline}`, etc. >>>> >>>> So I try >>>> >>>> ``` >>>> pandoc -f html -t markdown-raw_html-native_divs-native_spans-fenced_di= vs-header_attributes-auto_identifiers-inline_code_attributes-link_attribute= s-raw_attribute-simple_tables-multiline_tables-grid_tables page.html >>>> ``` >>>> >>>> and it works reasonably well, but I still get a bit of class clutter l= ike >>>> >>>> ``` >>>> {.v-visible-sr .js-screen-reader-info} >>>> ``` >>>> >>>> or attributes like >>>> >>>> ``` >>>> {title=3D"sometext=E2=80=9C} >>>> ``` >>>> >>>> , both after links >>>> >>>> How can I supress these? >>>> >>>> I want really only the text and (image) links. >>>> >>>> Any help much appreciated! >>>> >>>> >>>> --=20 >>>> You received this message because you are subscribed to the Google Gro= ups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send= an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To view this discussion on the web visit https://groups.google.com/d/m= sgid/pandoc-discuss/8AD0B607-B556-48CC-83AA-7D0BACD3B8BE%40halloleo.hailmai= l.net. >>> >>> --=20 >>> You received this message because you are subscribed to the Google Grou= ps "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send = an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit https://groups.google.com/d/ms= gid/pandoc-discuss/1E55C9DB-9A8C-4064-9927-2EC8B70076A0%40gmail.com. >> >> --=20 >> You received this message because you are subscribed to the Google Group= s "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit https://groups.google.com/d/msg= id/pandoc-discuss/CA3F14A6-3BC2-47D2-9FDC-ED464D6CAF49%40halloleo.hailmail.= net. > > --=20 > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/C0C55D35-D675-4B6E-8D1A-CACEF8F738D1%40gmail.com. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/0B417699-0C93-4DBC-9B09-8C36B6F39B0F%40halloleo.hailmail.net= .