From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/30857 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Albert Krewinkel Newsgroups: gmane.text.pandoc Subject: Re: HTML attributes not being stripped off Date: Mon, 27 Jun 2022 13:37:43 +0200 Message-ID: <87r13abaeb.fsf@zeitkraut.de> References: <509F89B3.4070403@web.de> <20121111223615.GE4399@Johns-MacBook-Air-2.local> <50A14A92.9060301@web.de> <33fcfdbf-3edc-4145-a7f0-325bfd42698fn@googlegroups.com> <87174047-ad9b-b702-4a08-eaa3c00c511d@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24866"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCZJF7XJTILRBIVP42KQMGQE2UG2BFQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Jun 27 13:42:31 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lf1-f59.google.com ([209.85.167.59]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1o5n8Q-0006HC-C6 for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 27 Jun 2022 13:42:30 +0200 Original-Received: by mail-lf1-f59.google.com with SMTP id f29-20020a19dc5d000000b004811c8d1918sf938968lfj.2 for ; Mon, 27 Jun 2022 04:42:30 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1656330150; cv=pass; d=google.com; s=arc-20160816; b=lX7AdAzPaClXm7fnXySWZd8yIfE0tahRG180UV/4LGXZDY/VEKoHRdJsTv2azPwQKZ Y9FTFjE9uGSj+6A6wsBOsUbwKqAzpnxiZBQajr/DhluexqjeSClK1Z7ZP6pRkwMUcyc4 uYyncZL1qIIrAl3DhBIYGG+YQJmoIhArw+t93H2rO1ZAyENt4036QhOOU09wC+FAz/8M HfCFXbBGcry6epg0uyaoumM99Lzvf1TOXGK9zbMrN5RkmRbcHwMFRKUFALhYnIuLJw1I UIVx65YiCQcydWYCyTQdYjhneIRukKd2fMf8lmMpWBdQXPp7O3JnPY0OUIZ4f+iZgbDU L3tA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :in-reply-to:date:subject:to:from:references:sender:dkim-signature; bh=z+CYRcRD48LVCKSe1LEA2WOKYLUYAqSdNrE0ULwejsc=; b=xJLxHevP/fXlfRjgyh4Mf85z2YXKimAmU835sOrBn0IBt2m87n8snMHrDP95ZXmD4N b3VUJcO9sAMYUn2flGDSbvRclHx9bv3aNQkbnwGk2yqNvW7KcgQkxZ8G3vn4VGerWL7k AUtIkBtgymZNgKnq8LpkUpaV24QQo23TthUgHSgc6O39oYf91FnXpWIsPFw6Ks0PalNo 0cWo637SEe9s2Z7eje8fixsxmQOcUYayLOQErr8C5K7MRPr/YAlvRGcrxHponpz1oGru QGnxFJBy8KQTArF1IHcXbFNkR5mscD7PObcvceKt/NqIf1YsO77HMBNqVR+/Mq96AlZ7 Ya+Q== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 80.241.56.151 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:references:from:to:subject:date:in-reply-to:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=z+CYRcRD48LVCKSe1LEA2WOKYLUYAqSdNrE0ULwejsc=; b=DosDnJaZwcVE6jicnG5omILZFszjCyaicFYOIBZ7TBi2Fmy6IE184d4PntUgXqX45u LqPdQRVd+LN+C10N7ck7fRS56pRCTqXAymLX/IB0Kt5AlJ7T5uU+rHslSBlfcilzhSPb btt+ZhmPY6cbKpo5Eitt1M3nuwxDKoJI82LNB68BYbWoRkvb9V/rqVamtYWSoYHtLkz9 37yyafXTyn5RGWHx15GCd3ogJAaoWnTJyTs4t9U9yf/gD8o80IdLbpHS8cuGIqprLUhm GCoFE83SCAJow3KLoWRBfF3yBU2D5tycszXXGw4Job5mO6ih4qS7o8QE5dGM4mDpo99r Nhxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:references:from:to:subject:date :in-reply-to:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=z+CYRcRD48LVCKSe1LEA2WOKYLUYAqSdNrE0ULwejsc=; b=AQiu92sU2Z1KKskgQap3QJ515y8Txz2ptrtCpUp6xbSEG/3dxhqv2h/QE+vWBPI2Cv gIisyyK4WPczXOw4Qt2uwoa21b2EUNu674PLYvkqtSfgQGanwXDUyRShuC3PsEr8pKHk XdKBkZChcLitlCq5PegkvJu9p1VbuATR6AGZxc67B/Sesa71I4yiG4Swq8qcM+vmGjy+ 1hA7Ho2tTSgQDA+OCa3Q9Ds/POKprwiiRhf4YFN8Gz8PcsFxOFEK1TYMUtonnKpDymxg TmsBc1NS0+Pf1xHNYomXT7KDtf1xAs/CcrvRT6VoVwKnWDdlTRLmzJB6XE2abLG0XuUz SeBQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AJIora8HMgQDWQFl7UHnHEC7LRt3bm140gXZ5nCm6qTeZ+EWH1Qr/MjE Pw+kWCONNQc+sX27tqDgU0A= X-Google-Smtp-Source: AGRyM1sFtafEyrGC0ndPWd5zNfW+CfmkZzbLewY5im2CU+1lo5Ir4Qmyp52bR+m/FWNvGK6km3b8Gg== X-Received: by 2002:a2e:a236:0:b0:25a:af3c:5c66 with SMTP id i22-20020a2ea236000000b0025aaf3c5c66mr6853442ljm.30.1656330149889; Mon, 27 Jun 2022 04:42:29 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a2e:864e:0:b0:25a:6d5e:adb1 with SMTP id i14-20020a2e864e000000b0025a6d5eadb1ls3391867ljj.2.gmail; Mon, 27 Jun 2022 04:42:24 -0700 (PDT) X-Received: by 2002:a2e:9dd3:0:b0:25a:72ea:3a15 with SMTP id x19-20020a2e9dd3000000b0025a72ea3a15mr6378274ljj.29.1656330144756; Mon, 27 Jun 2022 04:42:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656330144; cv=none; d=google.com; s=arc-20160816; b=TP9bVtvytfjcIl/4W6BlYM5Lj+5xWok5bNXkqIJSq6tO86CeKPrgnG+BwMseftDJpW 4bbO5q3pgQPdfz0zsO5RQEHZWhbkvgQhZh0q9+XO+79koyhp6TvgD/90AfD9Ri/mINx4 8lYAraML5VDKbRrk3DqwSCfYRTXBZD54Bxebd5SGMIii8FryeUZvy6FuvowQkyeXID84 YXjnzF3Wd1QCSsYv2AHZyIaeZ1MB5O7J8hOgBmKK2SkPCcaIDi7iVSsV+ogWCTAR4p4I fW7knsTGTBNiJSLBoiazgUHL9+viQGVhnfEDuhwUIA9dOOGSMIg2OWr83/aUOdAUZPyD +0/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:in-reply-to:date:subject:to:from:references; bh=3dlNif5b0MkrPaZZXuReO4kMuzXWahSsTs11yp9QXGM=; b=ypYoKJe4V9gJ0FwdwenrboF7VsdCkkpIaE+VmHdW4LmZLdKv49q0htxYnX9MXNjXrP rbYDjkAc9d+CEVkhnxspJGS7vpSPxtob6I4wCxKiVIJmHjhB6Qj8e9PYlMdX+YNMzzV+ 9kZtMsoo6aP/+/quQqA2Om7KqM619HC1AcN2SjoDpBXViNhjNMQiMojIWvUpFX0bOgCv sn9dFVHry0OMMoNkiVW9OOUIILPU4ulYrg4I/sFn2KYKUaeDoARJBSgyeytI4Eop5E/L wmp8O4LydQcCeW1E+QBEalXeHQ5+hc42A7FwQlLScV6W93wFyVPHDgG8Syg3LA7Z0bXC TUxg== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 80.241.56.151 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org Original-Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org. [80.241.56.151]) by gmr-mx.google.com with ESMTPS id p15-20020a2eb98f000000b0025a8d717b7dsi491033ljp.5.2022.06.27.04.42.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jun 2022 04:42:24 -0700 (PDT) Received-SPF: pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 80.241.56.151 as permitted sender) client-ip=80.241.56.151; Original-Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4LWm6j50gQz9sV3 for ; Mon, 27 Jun 2022 13:42:21 +0200 (CEST) In-reply-to: X-Rspamd-Queue-Id: 4LWm6j50gQz9sV3 X-Original-Sender: albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 80.241.56.151 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:30857 Archived-At: "'guenael Muller' via pandoc-discuss" writes: > The idea there, is to be able to convert both html (generated by a rich > text editor) and markdown (or other similar markup language) file > through a similar pipeline to a pdf with similar style. Using a > different templating engine somewhere in the pipeline mean more > complexity, so i'm considering the idea of using pandoc templating if > the html result is okay. OK, I see. How about the following approach then: use a custom reader that passes the input through as raw HTML if any of the files have an `.html` extension, but otherwise treats the input as Markdown. ``` lua function Reader (sources, opts) local raw_html = false for _, source in ipairs(sources) do if source.name:match '%.htm[l]$' then raw_html = true end end if raw_html then local blocks = sources:map( function (source) return pandoc.RawBlock('html', tostring(source)) end ) return pandoc.Pandoc(blocks) else return pandoc.read(sources, 'markdown', opts) end end ``` See also . -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124