From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32509 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Oliver Newsgroups: gmane.text.pandoc Subject: Re: Attribute-less Markdown from web page Html Date: Sun, 30 Apr 2023 00:01:54 +1000 Message-ID: <2418447D-A319-4F7C-A9D7-AB5BC8C8892B@halloleo.hailmail.net> References: <8AD0B607-B556-48CC-83AA-7D0BACD3B8BE@halloleo.hailmail.net> <1E55C9DB-9A8C-4064-9927-2EC8B70076A0@gmail.com> <0B417699-0C93-4DBC-9B09-8C36B6F39B0F@halloleo.hailmail.net> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1664"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCEMPL7DTEMRBWGGWSRAMGQEPX4ZQQY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Apr 29 16:02:04 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-yw1-f189.google.com ([209.85.128.189]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1psl9I-0000Dk-J3 for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 29 Apr 2023 16:02:04 +0200 Original-Received: by mail-yw1-f189.google.com with SMTP id 00721157ae682-552e1cb08e0sf9444937b3.2 for ; Sat, 29 Apr 2023 07:02:04 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1682776923; cv=pass; d=google.com; s=arc-20160816; b=hO0PT1STpS82DEw/v5TgQdIXHPh3lXpfuPJAgfPExr5l800VKHy9trM/XVLLESJtTw AbOSbNybNBOL/FXnmJVlcCwrJn0WV6zW6UyiDMVEDqGnGQHkqvqlYtf82IRsz0R9/iVK zs65PkBjp3z9XzIKdpVdQQ+jfdHNMTfk6BnBvZuxKbxXb0pviuowxRtwai2pSEs4HpPj wWnLOsUckYpJsSDhETkzo1l/IjOkg/OCYnQrlffHejDjFh3iKCSz0RBN3eXNktFYR+97 R8Qg69aJK2jlfwdPxzIXT7yMYNOwP1wmtqigEv9TmUbeZ1wbc6sJeTIcFLjG8hvgDw/r t76g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :feedback-id:sender:dkim-signature; bh=4TiQ3UjIkUrg+b9KBbj+pfbTZq6y8TzIb0FAjt+LBAs=; b=quUhzHJFBtueeKGuTPNFpopyel24VoPmFg67YEIaaaJ4IXB1wQfWgB24zsqf4MrYiO UqqH1ksvruU7iR/4xdQG40NvYL6Yi5isftDgRg+VWYQy/n+ZSTeA459C5dbzXp0n1vlC DK+CimgNLGWd2F7R5v94zGDr8LIG8wWQxJbVLttJC/Xq4YwwYjdZM++f8Bo1mC1jkE7x Soiy8t4FfUpn/lFCKmw/mqcvqiLSqkHJ5I10XNneI7smxO9lALQJxrcCm5eCNkFLbroQ 86pMe02G4c8ZqtujnzYvRMDNFBeWT/FZ+YmJdQkMRQSsVZRD3fJ4YVHV2ISf9EL8YDCk gZ6A== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@hailmail.net header.s=fm3 header.b=IfYffloV; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=VSHDppJm; spf=pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) smtp.mailfrom=news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hailmail.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1682776923; x=1685368923; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:feedback-id:sender:from:to:cc :subject:date:message-id:reply-to; bh=4TiQ3UjIkUrg+b9KBbj+pfbTZq6y8TzIb0FAjt+LBAs=; b=jophdyVF3vkbgLev8ovcGSVJDKa9WvDhCKUBYh4mOXdPjRj08sv+IdgV+Q855WGZma wl5XwodR3H7m6OcHS+4DLlSHfFheF4JQLRKYjkfS+oqMSCcJ0S29X3lBHEF0elXxoPEV 5lHAJfbGF0vZJPFyUXG6zeqlcS4A6ek6b7MU2Iy5IJ1loz7J/IqFyAWZ7Tvwm3XA2SA6 MV+X/CrgxZr8mw9hEGtIsjD6jEByz6UnE4C3QJ9/d9LxMpcNvQm4PMhIeC3O0BT2x5Sz 2Ut7QUngg/L24OI0M2CCEfOcj X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682776923; x=1685368923; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:feedback-id:x-beenthere :x-gm-message-state:sender:from:to:cc:subject:date:message-id :reply-to; bh=4TiQ3UjIkUrg+b9KBbj+pfbTZq6y8TzIb0FAjt+LBAs=; b=Bi2snMly9uNtd2bBCzzLQKsu1SOaWoykUBfuZ9UoBuQu005ui3d2s2wPaMFkxzGm7N qFq2I08G0EDiv8xt1wgtGr2apxYBKMbUPzQKEZQbD0yoDz2+JmI1fkelmOm+4RNPeGvD xx8au3YcSYA+2Ql6MlMyvqhOv6w43kurL9NSvCVFdIG2YjO5+yC3Izpyvx+kOKSCbUcW rw4gL8EzlHDTf9fMP5tQiDLryWdf61M9j2aZ Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AC+VfDxCmX/L9dvJ9w21Ql+F1Co+wkefO/CKkRpls8R5mG/JYEbvvni+ LfORVuf65N1kK+FDoDlyuYk= X-Google-Smtp-Source: ACHHUZ4frNAnLRYvlDjgo59tueczk7KUShRz8N2x4p9JzspQqgj0bdJcFleo2IZUZTWmw5aBDO9AiQ== X-Received: by 2002:a81:e20b:0:b0:555:cd45:bc3a with SMTP id p11-20020a81e20b000000b00555cd45bc3amr5270004ywl.9.1682776923584; Sat, 29 Apr 2023 07:02:03 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a5b:604:0:b0:b92:2b28:3455 with SMTP id d4-20020a5b0604000000b00b922b283455ls4046139ybq.8.-pod-prod-gmail; Sat, 29 Apr 2023 07:02:00 -0700 (PDT) X-Received: by 2002:a25:1002:0:b0:b8f:6ebd:4030 with SMTP id 2-20020a251002000000b00b8f6ebd4030mr6493584ybq.51.1682776920127; Sat, 29 Apr 2023 07:02:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682776920; cv=none; d=google.com; s=arc-20160816; b=LOfy4MpVF7bIux0DUc9O6jj0YbUu7bR9cCAw2s7e245TQcFOw/+AUICJ/PCQmRO9dP yF9aZrO4m//lTK7ejrm0nDD/ECRyN/yr1zTp7F/I9by4OPFt2MiA28bV46zqY6Ms8ZId 063vfWwHM8VXEUz4DO9Yl2XccIuZZxDkF1D+kYzdZBNcgstToz3in0tL4PmuCfO4m0cG 3WtbuZCI3hGFLV88I0OVfsPOUIqXK3CR4XLD4SM3E+wzE2OPfg87axYh96t3HG1ZWeXy YIZphOBtRNh3+IfBE/bcPAsybwnVqIcizQX1u9xDmtMECLpsMn8zsA5SCZWB1zF0z71b 6DTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:feedback-id:dkim-signature :dkim-signature; bh=I3tqFrFiq5MJfHZWINoy3rxkcGAxXlcUYA0f9XmCysw=; b=x208PANPH9+aodh2sGfqSQeX5iEW2HKY95BU9B6FXZFz6Pc1lASIYoSyv4RUuVFpC+ KZpox5VcyLr3C+q7MwcXVNfjMa8uMoZMWUm8yE+pmLqwTF2kz90ExVLdKUVH0pm4YTh8 ayICqk8UJKQALF4a0lN5XTCHAnaW6JYD4na0aUcFiWAMD2QxpawKio7D8akOih92kh3X VQ5bkzPsp9wZIZIOKcG2jmcxGutqclhUjMgkf4Q0RzWV7gKj+5XlwoR0NZ+6wFhUK7B2 CEPaAdHMTiabVvyFWxV9IxKLI7UEMI6Cv0D+pttpj3/+wq51uLA+pbW1OeoQHdrZo3Jt 6nQQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@hailmail.net header.s=fm3 header.b=IfYffloV; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=VSHDppJm; spf=pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) smtp.mailfrom=news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hailmail.net Original-Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com. [66.111.4.27]) by gmr-mx.google.com with ESMTPS id x6-20020a25e006000000b00b9a4f329f28si634896ybg.4.2023.04.29.07.01.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 29 Apr 2023 07:01:59 -0700 (PDT) Received-SPF: pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) client-ip=66.111.4.27; Original-Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 90F995C004A for ; Sat, 29 Apr 2023 10:01:59 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Sat, 29 Apr 2023 10:01:59 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvtddggeeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucgoufhushhpvggtthffohhmrghinhculdegledmne cujfgurhephffvufffoffkjghfgggtgfesthhqmhdtredtjeenucfhrhhomhepqfhlihhv vghruceonhgvfihssehhrghllhholhgvohdrhhgrihhlmhgrihhlrdhnvghtqeenucggtf frrghtthgvrhhnpefgfeekffefhffgteejhfdtgfehkedtjeejtddthfffudejhefhveev vddvgeduueenucffohhmrghinhepphgrnhguohgtrdhorhhgpdhgohhoghhlvgdrtghomh enucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehnvgif sheshhgrlhhlohhlvghordhhrghilhhmrghilhdrnhgvth X-ME-Proxy: Feedback-ID: i02894466:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Sat, 29 Apr 2023 10:01:58 -0400 (EDT) X-Mailer: MailMate (1.14r5926) In-Reply-To: X-Original-Sender: news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@hailmail.net header.s=fm3 header.b=IfYffloV; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=VSHDppJm; spf=pass (google.com: domain of news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org designates 66.111.4.27 as permitted sender) smtp.mailfrom=news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hailmail.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32509 Archived-At: Hey, thanks for this! Will try it out. On 29 Apr 2023, at 15:04, John MacFarlane wrote: > I think that's just an oversight. I just pushed a change to the commonmar= k writer so it will use the shortcut forms. > > >> On Apr 28, 2023, at 3:14 PM, Oliver wrote: >> >> Cool. But CommonMark uses these strange empty link reference labels (whe= n the link title itself is the link label): >> >> [Pandoc Manual][] >> >> [Pandoc Manual]: https://pandoc.org/MANUAL.html >> >> Is there a way to switch this off? I.e just: >> >> [Pandoc Manual] >> >> >> On 29 Apr 2023, at 2:49, John MacFarlane wrote: >> >>> You would get different indentation with `-t commonmark`. markdown_stri= ct follows the '4-space rule'. >>> >>> >>>> On Apr 26, 2023, at 4:59 PM, Oliver wrote= : >>>> >>>> Thanks John >>>> >>>> `-t markdown_strict-raw_html` does the trick for me! >>>> >>>> One thing though with markdown_strict is odd: The text of lists is ind= ented to the next 4-space column: >>>> >>>> `- list text` >>>> >>>> Can I somehow tell the markdown_strict writer to use only _one_ space = here: >>>> >>>> `- list text` >>>> >>>> Anyway, thousand time thanks for Pandoc! >>>> >>>> >>>> On 26 Apr 2023, at 15:39, John MacFarlane wrote: >>>> >>>>> Turning off -link_attributes should do it, but looks like you tried t= hat. >>>>> >>>>> I'd have to look at an example of the input that produces this with t= hese settings. >>>>> >>>>> If you don't need fancy features, you could also try `-t commonmark` = or `-t markdown_strict`. >>>>> >>>>>> On Apr 25, 2023, at 5:32 PM, Oliver wro= te: >>>>>> >>>>>> Hi all >>>>>> >>>>>> I try to use Pandoc to convert web pages to markdown without all the= class clutter like `{.underline}`, etc. >>>>>> >>>>>> So I try >>>>>> >>>>>> ``` >>>>>> pandoc -f html -t markdown-raw_html-native_divs-native_spans-fenced_= divs-header_attributes-auto_identifiers-inline_code_attributes-link_attribu= tes-raw_attribute-simple_tables-multiline_tables-grid_tables page.html >>>>>> ``` >>>>>> >>>>>> and it works reasonably well, but I still get a bit of class clutter= like >>>>>> >>>>>> ``` >>>>>> {.v-visible-sr .js-screen-reader-info} >>>>>> ``` >>>>>> >>>>>> or attributes like >>>>>> >>>>>> ``` >>>>>> {title=3D"sometext=E2=80=9C} >>>>>> ``` >>>>>> >>>>>> , both after links >>>>>> >>>>>> How can I supress these? >>>>>> >>>>>> I want really only the text and (image) links. >>>>>> >>>>>> Any help much appreciated! >>>>>> >>>>>> >>>>>> --=20 >>>>>> You received this message because you are subscribed to the Google G= roups "pandoc-discuss" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, se= nd an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>>>> To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/8AD0B607-B556-48CC-83AA-7D0BACD3B8BE%40halloleo.hailm= ail.net. >>>>> >>>>> --=20 >>>>> You received this message because you are subscribed to the Google Gr= oups "pandoc-discuss" group. >>>>> To unsubscribe from this group and stop receiving emails from it, sen= d an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>>> To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/1E55C9DB-9A8C-4064-9927-2EC8B70076A0%40gmail.com. >>>> >>>> --=20 >>>> You received this message because you are subscribed to the Google Gro= ups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send= an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To view this discussion on the web visit https://groups.google.com/d/m= sgid/pandoc-discuss/CA3F14A6-3BC2-47D2-9FDC-ED464D6CAF49%40halloleo.hailmai= l.net. >>> >>> --=20 >>> You received this message because you are subscribed to the Google Grou= ps "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send = an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit https://groups.google.com/d/ms= gid/pandoc-discuss/C0C55D35-D675-4B6E-8D1A-CACEF8F738D1%40gmail.com. >> >> --=20 >> You received this message because you are subscribed to the Google Group= s "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit https://groups.google.com/d/msg= id/pandoc-discuss/0B417699-0C93-4DBC-9B09-8C36B6F39B0F%40halloleo.hailmail.= net. > > --=20 > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/A3814551-04F1-4FD2-B6CF-7B89E39D840A%40gmail.com. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/2418447D-A319-4F7C-A9D7-AB5BC8C8892B%40halloleo.hailmail.net= .