From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32502 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: Attribute-less Markdown from web page Html Date: Fri, 28 Apr 2023 09:49:01 -0700 Message-ID: References: <8AD0B607-B556-48CC-83AA-7D0BACD3B8BE@halloleo.hailmail.net> <1E55C9DB-9A8C-4064-9927-2EC8B70076A0@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.3\)) Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18106"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDW7ZIEHTIIBBAHSV6RAMGQEWJBONBY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Apr 28 18:49:09 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qk1-f185.google.com ([209.85.222.185]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1psRHQ-0004Vi-9Z for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 28 Apr 2023 18:49:08 +0200 Original-Received: by mail-qk1-f185.google.com with SMTP id af79cd13be357-75131c05344sf937619285a.1 for ; Fri, 28 Apr 2023 09:49:08 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1682700547; cv=pass; d=google.com; s=arc-20160816; b=jcKE3t1YPW9/j1eeTrynOqupZvpU2eWNmi9M6h09VpAZ/RgGPvID4vidMFOtZsCBNF AJwHk+/rnDN5SCQLIHuou/bg9uKAtOkaDtkkJZVAhYY/s5hsYht5f/6qmRVltH+Ynrzb ZU7K4JvjjPoIapgRIdzKlxJjATS40o1rRmMWnf1++ptbm5t3Hm1bPILTv/pkPKlXZ0fG Bk/+96MuYjiwGHO4XDL1VA5GZAlanyAwddZbLMb+NP9M8C6BG1CXmX3dZWMfd9ULn8ml cWXuHWSJqi8mboDpE8nMJE1EtIWSAgzG0JE7ZKWTvETA64Knjol35GUycD1/EoUCS07P u64Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:message-id:in-reply-to:to :references:date:subject:mime-version:content-transfer-encoding:from :sender:dkim-signature:dkim-signature; bh=GMKKiUxmqx6jcI//MyKG+b3801g+bRrzyJ63JFW7W2w=; b=cRYWCnLZFpDS4Ru6gNKWuiIHE3b+EV1iZ7z80+H9koQ10UeGgQcY0L6S9yg6XElqw/ 3WJQ3SeNio6/5/bR2U6Y1j0HYtIftdwfel8jiYWxnWLGesbUSBPRKrwPq1IyePxTHa9D KL6bVPbp5S0PzLeFxsuotX9UF7rT5zNJwDKU4+ivGol8Tz9wWTS6mN4BXjQ0vREEaxpK eKKmxiB30/kOK0tuPNm05wgF1NabMMN+liVdFMCmudAQ4Pnf9U40oKwk5k2cf0EoQ5hT ohalcPc149ipOROCmEl2wjjLwZRRSml2mhCrAGUOtsYptyewwaXas51VCkXHaghGlobP raIQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=F9I1P65k; spf=pass (google.com: domain of fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::1032 as permitted sender) smtp.mailfrom=fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1682700547; x=1685292547; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:message-id :in-reply-to:to:references:date:subject:mime-version :content-transfer-encoding:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=GMKKiUxmqx6jcI//MyKG+b3801g+bRrzyJ63JFW7W2w=; b=bGHu2+5sJy3upCd4zljdksKrEE27b2jlKJUu9vbFcgkuxK++NTyH6XzsfO0X4OpbbC iASicW0wjYCGEs/xSThaCFFC+bxvAqH1yAQ0UtllEmm3XLlr4AwcbEzLRGne44V7D2Nc i5WLkZb2xa9AEzZ3TvOS2mf5lMsddYhfN04ksBsbbdoo41BbAOk0kYlTH6Zj1/KH+S/m o7D5v/jMie+h17YWC3QDy3cBLopjJ4E7au5lONxt4s8+Ttf8t7KdqsUcg3fMw7dMV305 EjWWx56aoc7325urHQpH6gxcRXfqF8jqxR+dN DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682700547; x=1685292547; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:message-id :in-reply-to:to:references:date:subject:mime-version :content-transfer-encoding:from:from:to:cc:subject:date:message-id :reply-to; bh=GMKKiUxmqx6jcI//MyKG+b3801g+bRrzyJ63JFW7W2w=; b=ltifgKdIn5SR1eQzxym76HMeFuWth9d9FrhhPdpupS3pcfa+jMDCLCXJ5KNBDnB42h 5BqR3NHyNET1L3vKtQWZeEvnfy2ZgwzmbqQoX439Sgp/PovM9C813ZdVvoGXIR1hY+Hv TXEzTy3LywNL3/1kURf14DTYadpITjRRJPJ1cv6R0a709CBXdjukPTm8L0DJH9O1Mj+v 6+HM+Td8ucgAG3GT6FV7NfHxK4Fj+2fFXBbFnY3NmXC/DUwCpIp9Fx52HZKNTdhtoBU/ wEsvsAyb+9V3F3smbAyiyCiH6CmOzqeDFusuIc5YSgXqMoR2STY X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682700547; x=1685292547; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:message-id :in-reply-to:to:references:date:subject:mime-version :content-transfer-encoding:from:x-beenthere:x-gm-message-state :sender:from:to:cc:subject:date:message-id:reply-to; bh=GMKKiUxmqx6jcI//MyKG+b3801g+bRrzyJ63JFW7W2w=; b=Vp78ygZUlIySHil+TRw43rgfkh66foicDlKVSrGBo0DfbgVvjggWO3QHWhmZPUiBWK mBIsvqG15uv5TGGC4NMaCR1q+szOdNCQt0eeZFtkTtS3t/85hlKESPP7Mf71Gge1FN7a 19LGKOKqwakVMeOpbFE/IfwjeiD+7cbqK3PvyPCvJmS/dlwnfHflRMhedme+tI3MY1Hw x0GVStD62idLKNlytqQsuNR98FmmcX4fod/c4V05eNO4Wc2BftG9pg3j8r Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AC+VfDzcGU6qdoiRICR1PUM+0vKJZL8gRJ2NU9qrkDJr3ljGhQFy0aPq 1SUhRxpjCWIplWJ3Fl17GpI= X-Google-Smtp-Source: ACHHUZ4SmDe3yKSGnV9pZzBEjcllOWpZiJO2+DIOZU/aG9SZT2KaYZhgIgSqUUQy1n5NH81ceBh+Ig== X-Received: by 2002:ac8:5c52:0:b0:3ef:33fc:96d0 with SMTP id j18-20020ac85c52000000b003ef33fc96d0mr2867917qtj.4.1682700547189; Fri, 28 Apr 2023 09:49:07 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:ad4:5362:0:b0:5ef:61a6:9da2 with SMTP id e2-20020ad45362000000b005ef61a69da2ls3441237qvv.11.-pod-prod-gmail; Fri, 28 Apr 2023 09:49:04 -0700 (PDT) X-Received: by 2002:a05:6214:1d27:b0:5ee:785e:42b1 with SMTP id f7-20020a0562141d2700b005ee785e42b1mr9983047qvd.32.1682700544513; Fri, 28 Apr 2023 09:49:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682700544; cv=none; d=google.com; s=arc-20160816; b=h0die+tUWBGSmCwEnkIMdCkj3j9ZfFdiK6/IXPH0h8MMX6cCmhE7jxDaB3EMxtJFDw cvN0vu/+pM2yAypBHKG0RTHJcwFRLvFb7IFahqkdnO7HW8k5S+xapkISe1WOciz52mqf lCwkV3td57SSYKciKpYc5sQ89NJgQZHGIwePJ+ZhAAdh+8RDeYNjzoLEeVhnIEwI2vVB DpXtdhSL/DfTS3ju2xiCLM5VQN7VbTM0QpaazcuY2Hrwe2nCknDqBDKH+M9CqW/IXNt9 z4VOG7Ky8nAR76DdlOkKEpHu+vm2EX6/IfSaY2YlreEAmg/oZebQ6ougP9tq1n37HaUZ 4DaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:in-reply-to:to:references:date:subject:mime-version :content-transfer-encoding:from:dkim-signature; bh=Ln1X+PzhaCFjDT+Wrtfd+RwlAG9nhIQXPgsUkWLMDeY=; b=NfL6W61ek8UxWWN4TRrlFRHML0p0O5tDfZHpwVoxfrYSBL/sFETg/G2CiKH/k54MCy rAIO6P6PUFOWIwc6qnn4srMBIVXPubz0YxTqxn8u+Bm7U0rzEYDYh5WjmPIHu0j4xwAy wJ48/5bZeg1k8JIv2HYYTpGS3smPCbpcVLbAAZ/HWPOllVr2B05Mer1ADyX2xAiPu7i6 Z++eTKz6X+q3+xvXC2OOqSWdeI7nX32mkY0OFimjl+JBgxOe3sCf8TNiRO4RAuC+bKgv ntYWCGiP3hMGsRH6M52P7NDvk+oqrQ+ue+dsyLHnWkIAlboiYQgUtJL/eGsQKVjeFbDo AYlQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=F9I1P65k; spf=pass (google.com: domain of fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::1032 as permitted sender) smtp.mailfrom=fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Original-Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com. [2607:f8b0:4864:20::1032]) by gmr-mx.google.com with ESMTPS id cx15-20020a056214188f00b005ef46b24423si1143359qvb.8.2023.04.28.09.49.04 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Apr 2023 09:49:04 -0700 (PDT) Received-SPF: pass (google.com: domain of fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::1032 as permitted sender) client-ip=2607:f8b0:4864:20::1032; Original-Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-246fa478d45so156473a91.3 for ; Fri, 28 Apr 2023 09:49:04 -0700 (PDT) X-Received: by 2002:a17:90b:1d85:b0:24b:2ef6:64d5 with SMTP id pf5-20020a17090b1d8500b0024b2ef664d5mr5986769pjb.47.1682700543279; Fri, 28 Apr 2023 09:49:03 -0700 (PDT) Original-Received: from smtpclient.apple ([2601:644:4700:2110:9d7:dd70:dfcb:bfcc]) by smtp.gmail.com with ESMTPSA id c2-20020a17090ad90200b0024bbc789c29sm1689747pjv.23.2023.04.28.09.49.02 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Apr 2023 09:49:02 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3696.120.41.1.3) X-Original-Sender: fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=F9I1P65k; spf=pass (google.com: domain of fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::1032 as permitted sender) smtp.mailfrom=fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32502 Archived-At: You would get different indentation with `-t commonmark`. markdown_strict f= ollows the '4-space rule'. > On Apr 26, 2023, at 4:59 PM, Oliver wrote: >=20 > Thanks John >=20 > `-t markdown_strict-raw_html` does the trick for me! >=20 > One thing though with markdown_strict is odd: The text of lists is indent= ed to the next 4-space column: >=20 > `- list text` >=20 > Can I somehow tell the markdown_strict writer to use only _one_ space her= e: >=20 > `- list text` >=20 > Anyway, thousand time thanks for Pandoc! >=20 >=20 > On 26 Apr 2023, at 15:39, John MacFarlane wrote: >=20 >> Turning off -link_attributes should do it, but looks like you tried that= . >>=20 >> I'd have to look at an example of the input that produces this with thes= e settings. >>=20 >> If you don't need fancy features, you could also try `-t commonmark` or = `-t markdown_strict`. >>=20 >>> On Apr 25, 2023, at 5:32 PM, Oliver wrote: >>>=20 >>> Hi all >>>=20 >>> I try to use Pandoc to convert web pages to markdown without all the cl= ass clutter like `{.underline}`, etc. >>>=20 >>> So I try >>>=20 >>> ``` >>> pandoc -f html -t markdown-raw_html-native_divs-native_spans-fenced_div= s-header_attributes-auto_identifiers-inline_code_attributes-link_attributes= -raw_attribute-simple_tables-multiline_tables-grid_tables page.html >>> ``` >>>=20 >>> and it works reasonably well, but I still get a bit of class clutter li= ke >>>=20 >>> ``` >>> {.v-visible-sr .js-screen-reader-info} >>> ``` >>>=20 >>> or attributes like >>>=20 >>> ``` >>> {title=3D"sometext=E2=80=9C} >>> ``` >>>=20 >>> , both after links >>>=20 >>> How can I supress these? >>>=20 >>> I want really only the text and (image) links. >>>=20 >>> Any help much appreciated! >>>=20 >>>=20 >>> --=20 >>> You received this message because you are subscribed to the Google Grou= ps "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send = an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit https://groups.google.com/d/ms= gid/pandoc-discuss/8AD0B607-B556-48CC-83AA-7D0BACD3B8BE%40halloleo.hailmail= .net. >>=20 >> --=20 >> You received this message because you are subscribed to the Google Group= s "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit https://groups.google.com/d/msg= id/pandoc-discuss/1E55C9DB-9A8C-4064-9927-2EC8B70076A0%40gmail.com. >=20 > --=20 > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/CA3F14A6-3BC2-47D2-9FDC-ED464D6CAF49%40halloleo.hailmail.n= et. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/C0C55D35-D675-4B6E-8D1A-CACEF8F738D1%40gmail.com.