From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32495 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Oliver Newsgroups: gmane.text.pandoc Subject: Re: Attribute-less Markdown from web page Html Date: Thu, 27 Apr 2023 09:59:23 +1000 Message-ID: References: <8AD0B607-B556-48CC-83AA-7D0BACD3B8BE@halloleo.hailmail.net> <1E55C9DB-9A8C-4064-9927-2EC8B70076A0@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11666"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCEMPL7DTEMRBYPVU2RAMGQEEDYWJ7I-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Apr 27 01:59:32 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oo1-f58.google.com ([209.85.161.58]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1prp2q-0002qY-9m for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 27 Apr 2023 01:59:32 +0200 Original-Received: by mail-oo1-f58.google.com with SMTP id 006d021491bc7-545e23a24b6sf5623023eaf.1 for ; Wed, 26 Apr 2023 16:59:32 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1682553571; cv=pass; d=google.com; s=arc-20160816; b=oId+cEKsWpKQXDJssWT8sDCiT2z6xQBkoUp1j3N22+zkgDVBFZhePMSWgNSVl9wIlY TrvC/yNC91vevdHabiJ5BGGXTwWOcwOP2EZKqJnhx12pb37cmFc99cT60QbkcLSA2Ziy loeuNNdhOCwjgZbb7jp5aHyp6X3XtmQITg1bFPldGJkhaGzbitZoaLynSiV7dScUR/xG q1JOv9RnBBSJuius9KcjKttxF2iT3gObcPCUwSyz5g1pzEfCFzZdN3M6k5krfezaSjB5 ZJzwZx58WamGa1/omUhXc2lbJyLzzo9zeiavoe1DHIheb9/8LEya21Qa5GFfarlwdpvL cFtw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :feedback-id:sender:dkim-signature; bh=T+ux74RvnzAnPpDYIrpeXfWeY63Y1DO/mBPb25VSwB0=; b=l0fJd8SrUUNWcY+txXjnPDah2lWkum+jXDsS2aekYkC4DmUb3i0gi9/RS8A0og0WZw 8fdMeLbrMmgMFDcv3bVx4jLjfFpnayQJHb5pogriZBTzO2NMHg/HGfZpltD48Hs5vcRp QLkNmaZEMOKjY9KzKfM9gGb6UDx5oGo38RbKTNpil3egUCWkGkKClpfNMfvdkCO9pky5 7bKo7aDtyemv/hmubymNPvPHA7AT7iVfY1rVZvJqc3K1YvtXfHCNW6O47cL85FhYl8tp id57qGW7K9z+ags0CQtqPU7ClnW2IYJC8SiwMAuGbD5m34H7vcu98zveqIbe1Q+mzv3T 3zow== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@hailmail.net header.s=fm3 header.b=tlz1uECH; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b="C1thTzu/"; spf=pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) smtp.mailfrom=news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hailmail.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1682553571; x=1685145571; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:feedback-id:sender:from:to:cc :subject:date:message-id:reply-to; bh=T+ux74RvnzAnPpDYIrpeXfWeY63Y1DO/mBPb25VSwB0=; b=s4yMfQldyKG2LXtb9bFx7JI66DiEthVKQj+ABUq23Wwuh6Hnt6Uary2oWYeUkqJiKO LANFIRYhm7dwaHi7/pcSlz3mzeR00ZruZ/Za/uI0XZKcKmgF0VGAvG0OMbPRBirjmhkv vDip7Xl3qV/JrpJXEfqbnCh99TMHaIiUIKlh9anakT3vNI+nPNkUy92gJy5M9ONFQjWi Luf/ToFs51Au91aGMqPFgup6dHxJZsnbrrrjA5dvK+YORd6TLk89bOhLxennQztQa7Uk 4sRX3F4lwcoAsf2i2zjZJgwSA X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682553571; x=1685145571; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:feedback-id:x-beenthere :x-gm-message-state:sender:from:to:cc:subject:date:message-id :reply-to; bh=T+ux74RvnzAnPpDYIrpeXfWeY63Y1DO/mBPb25VSwB0=; b=gLKee6SZ2rre8oLCpp7rNv//fV/9GL0O39qBI2wafpfJ+nWVjt0GPJr3LPl/pPmPzq dGuzUl3HcRK12DZC2+TfiCmv4afx026ZWhsVtw+NJYOqHpR49cvwsz/pvYZMNqpeLhay n2xwc6Jg0IiXgh7QloMemk9C+rAmwqQgknEaDNA8/4Aer7kI3n4sq0PYbqg6ktV/QTEN WoCMXHI8vgHtyUX6czxj4bCkbil6o9w2QxdL Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AAQBX9cxYT62rNchpM0EHbCo5dLLOCNheZ1sQxeaIRDtk2jkMny5DKry a+t9cDNlaqKjzrC06TpnEmo= X-Google-Smtp-Source: AKy350az1NgSyziV5UszBBI8TPhwsJYUKA01eOHvk8jwWWUns+HyGM3zsPl0BlaRRLaeT2o+SVBgjg== X-Received: by 2002:aca:a995:0:b0:38e:bdb7:3e8f with SMTP id s143-20020acaa995000000b0038ebdb73e8fmr4108319oie.6.1682553571198; Wed, 26 Apr 2023 16:59:31 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6870:780c:b0:176:30d5:30b9 with SMTP id hb12-20020a056870780c00b0017630d530b9ls2639oab.9.-pod-prod-gmail; Wed, 26 Apr 2023 16:59:29 -0700 (PDT) X-Received: by 2002:a05:6870:414c:b0:187:9ebd:3946 with SMTP id r12-20020a056870414c00b001879ebd3946mr14545836oad.51.1682553568962; Wed, 26 Apr 2023 16:59:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682553568; cv=none; d=google.com; s=arc-20160816; b=xhas5Kgc/DLd8Wler88qEQQVM/CK6aZExza447TqnfLN4p2Y9Eyk3j9wkZxT5lWcXo ZROIsS+HPV6BDizBZYoehpx4vm8x0nP9hKBQuzPsVIx8M2bQ6NFAW+GlKpCbsw75/3QX PEN/RhyrtE1lbVSD/mlGbVYcw3y4tN0eqsfwhNahoNUbI4+5EAHI81T6NwyFdaJnN5Vs zSB1+UWqE0XXrVP0PBYMa2Q2xz8ERjEea2PWSSFHU0COHbDHUcZWQZdzhxZatm+CLfZ9 p1THnYrPXBMf5CZLdxZ3Z+xXSrXzwb5qkWPL4NeJqleNhB9hTFBOGlk4NLaIh7ozoDLl rfpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:feedback-id:dkim-signature :dkim-signature; bh=r+etA3UKpTMRPAO84l1e9AeDPH4MTfxc0ezcCca6djU=; b=xz5kd+P1as/J5uDJ/Amy+1D5RBjsR25wdq4UbapOW1hPh+2X73b92EqblX2j1VZQWt WDSo+DoVX5y9sPLTvA2F7aOGyCwtDa8zY5O0IIJRmIeFPTCknes6MpguZ5U5ZwKxZfdl osfaXCOR9eDEtJyCmAU4SbdGIK4wo56jYHtQvjymBd6DBuLksMYMiAhxL+YDARA4e11a SaxAR7X5miP07lbFhYcXoRxwIgx0hgTlDH89ADo6G22fU44T0znjtzDgDGbeTHrn/LrK lU0ZKiX2jK8BFEuynBsbxos5pX+u7cHg+dzvC5+YN0zbIgIDwo3lSGYB8ceNbmnSD1sl 6i2A== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@hailmail.net header.s=fm3 header.b=tlz1uECH; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b="C1thTzu/"; spf=pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) smtp.mailfrom=news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hailmail.net Original-Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com. [66.111.4.27]) by gmr-mx.google.com with ESMTPS id vt1-20020a056871a18100b0018b384bd2b1si1467117oab.4.2023.04.26.16.59.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Apr 2023 16:59:28 -0700 (PDT) Received-SPF: pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) client-ip=66.111.4.27; Original-Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 511725C00EF for ; Wed, 26 Apr 2023 19:59:28 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Wed, 26 Apr 2023 19:59:28 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfeduhedgvdekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucgoufhushhpvggtthffohhmrghinhculdegledmne cujfgurhephffvufffoffkjghfgggtgfesthhqmhdtredtjeenucfhrhhomhepqfhlihhv vghruceonhgvfihssehhrghllhholhgvohdrhhgrihhlmhgrihhlrdhnvghtqeenucggtf frrghtthgvrhhnpeffgfegudekueeivefgiefghfejfeekkedtfeetveehhedvffeuhedt ffejfeehgfenucffohhmrghinhepghhoohhglhgvrdgtohhmnecuvehluhhsthgvrhfuih iivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepnhgvfihssehhrghllhholhgvohdr hhgrihhlmhgrihhlrdhnvght X-ME-Proxy: Feedback-ID: i02894466:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 26 Apr 2023 19:59:27 -0400 (EDT) X-Mailer: MailMate (1.14r5926) In-Reply-To: <1E55C9DB-9A8C-4064-9927-2EC8B70076A0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> X-Original-Sender: news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@hailmail.net header.s=fm3 header.b=tlz1uECH; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b="C1thTzu/"; spf=pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) smtp.mailfrom=news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hailmail.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32495 Archived-At: Thanks John `-t markdown_strict-raw_html` does the trick for me! One thing though with markdown_strict is odd: The text of lists is indented= to the next 4-space column: `- list text` Can I somehow tell the markdown_strict writer to use only _one_ space here: `- list text` Anyway, thousand time thanks for Pandoc! On 26 Apr 2023, at 15:39, John MacFarlane wrote: > Turning off -link_attributes should do it, but looks like you tried that. > > I'd have to look at an example of the input that produces this with these= settings. > > If you don't need fancy features, you could also try `-t commonmark` or `= -t markdown_strict`. > >> On Apr 25, 2023, at 5:32 PM, Oliver wrote: >> >> Hi all >> >> I try to use Pandoc to convert web pages to markdown without all the cla= ss clutter like `{.underline}`, etc. >> >> So I try >> >> ``` >> pandoc -f html -t markdown-raw_html-native_divs-native_spans-fenced_divs= -header_attributes-auto_identifiers-inline_code_attributes-link_attributes-= raw_attribute-simple_tables-multiline_tables-grid_tables page.html >> ``` >> >> and it works reasonably well, but I still get a bit of class clutter lik= e >> >> ``` >> {.v-visible-sr .js-screen-reader-info} >> ``` >> >> or attributes like >> >> ``` >> {title=3D"sometext=E2=80=9C} >> ``` >> >> , both after links >> >> How can I supress these? >> >> I want really only the text and (image) links. >> >> Any help much appreciated! >> >> >> --=20 >> You received this message because you are subscribed to the Google Group= s "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit https://groups.google.com/d/msg= id/pandoc-discuss/8AD0B607-B556-48CC-83AA-7D0BACD3B8BE%40halloleo.hailmail.= net. > > --=20 > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/1E55C9DB-9A8C-4064-9927-2EC8B70076A0%40gmail.com. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CA3F14A6-3BC2-47D2-9FDC-ED464D6CAF49%40halloleo.hailmail.net= .