From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31917 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Albert Krewinkel Newsgroups: gmane.text.pandoc Subject: Re: Pointers on modifying Plain objects(?) Date: Sat, 24 Dec 2022 11:26:40 +0100 Message-ID: References: <8af6876b-72cc-448e-9f5e-7d12ccdf2ad8n@googlegroups.com> <878riz8wf4.fsf@zeitkraut.de> <8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=----M5V69D7X0DHSW7WZFUTYYQ1XDLYQ1H Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3766"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCZJF7XJTILRB5FHTOOQMGQEAMM57MQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Dec 24 11:27:06 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lf1-f56.google.com ([209.85.167.56]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1p91k9-0000r9-6F for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 24 Dec 2022 11:27:05 +0100 Original-Received: by mail-lf1-f56.google.com with SMTP id r17-20020ac25c11000000b004cafaf9e530sf650813lfp.11 for ; Sat, 24 Dec 2022 02:27:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1671877624; cv=pass; d=google.com; s=arc-20160816; b=szXbuETKMSAjf4pkuUbJkwqAU2YFNXduHW3FkSktnSYAOb0LePUYw2LCqgNlHjs3Hr 5QqF9vdfIWxOnnaAfdRV0/nWYw3SsWwnQriGld010+2udSttoVFz8FSH4Qtz4KrSDmVY NOJsJu/l+cVprIiq7vLgsjNnCOeDZqGOA3QjgCNk+6TRCLPVbeEhaYMV6cNaLX0gun5J jsIgj4YlgwBuD/8ExQshWW7JU7tVYbAoutlYM+LvlvocQgh6ldKHP2W3UBx8xGHsPC6M QrpGIgOtyUmUJ4AR2ZRit4vReYGpnp4kL0m/zRemLljbT+6662kcL+xxz9l2H9GtbUQU ba+A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :mime-version:message-id:references:in-reply-to:subject:to:from:date :sender:dkim-signature; bh=59HlwMZR+iduYb/G+BxKgpLaAstfIwNcSpcb4UUf1DU=; b=kmbBMLpQQvhIzhM6vYw4+UyyTN5EEz1fedWVkbT/CBH1ieVizdZ0Xc+l8VKAOC6OU5 rLq4kTJhoA3ed4MEL/DwA9E1OUl+jivZZBa5qeJtyvjIdtIJduLcg3CYZbssEeBn5bdG GEKmQzpHRBbK3kyWlkPzRD3hB7J7aUrMzyqwrV4teM0JFi6yYVzToCXdS6vwKRyp8e88 VQAmk1atX+GOMSCN6kmbuXrrZK75SmYvCBVbcfzhisE5a7LfRftSNET0b/gUTSe5cYgm shrgWQlZwqwxvQzI1Re/N9Qn5mm7TlRzQqK9pYA4DjBa6uCmQrJideng7k2/m5Lmsnke mT0Q== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 2001:67c:2050:0:465::201 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:to:from:date:sender:from:to:cc:subject:date :message-id:reply-to; bh=59HlwMZR+iduYb/G+BxKgpLaAstfIwNcSpcb4UUf1DU=; b=hSOuX7wDCdLAsidI3FMwfA/mTiKm8sN23gFqG2QkJhZVISWTbmg2x7bdkuBpoMRkhz MIpzlKheVjKXKjmd96E+2xgC4Nkm7b5M5mp2egLgUTYHVliI7qU5q1QfXNDTEeWejbt8 TKnZ9p0zAS5/9JutGvm9FCI2iFOdFrqpOts+B0z4caRiDxNVhFrP3fm4Aqs6PAGAvAoK tkK5BGQhmdMZOtSRZxI8ovNwCxss/vA+6ETANMkolHl4QdxVzIWZyLWhJgh59nqLh+vN LhErpeMrZ0Z16GsfjQBHnIBTC3+nAbpcp84eJX5pc+iD0oqRbNZi305loaxNV0JZo X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:to:from:date:x-gm-message-state:sender:from:to :cc:subject:date:message-id:reply-to; bh=59HlwMZR+iduYb/G+BxKgpLaAstfIwNcSpcb4UUf1DU=; b=R2sxWLJG8g+ifFiH960a3Legw2UNKP3frAVU1XZdsDslZ0vroPQf98Xbl2w5HCmNAy neDNjJQFaQuRQeXHpvDFOA5esNDBhJxhbSD07aPmNeV/93xpX3KVbLVwdc3Pu1avmAcr 3w2CRtp+COuYkXNfhYvFXzCWYbm/3YfL/72Qy9d2GQKdXr8bsblPBz9xrOzDlzJ2y9dc xbB/WyW6sTgJ+69VxkXgqZDZubfgcdWx/yy+bGEgpL3jVkMXmsAyd09vqPf2WXiGfIm+ 36boj/PIAIuIkaCJgEhK Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AFqh2kqzJ52Xm6Tq2088wMQ8vpTwOVwlWt1bMGw8bRP2EzG/OUrJbGSV 5v1Mp8HyhpSgW5wN8AgoISA= X-Google-Smtp-Source: AMrXdXv2WlgvJOsNY3iz0HJbr2MXEic0Lb3UF7zUwkqT3/BH5NFNHsrTJrBS9O6+g7UNbwq7YReUkg== X-Received: by 2002:a05:651c:19a0:b0:27f:ad18:e894 with SMTP id bx32-20020a05651c19a000b0027fad18e894mr327272ljb.129.1671877624448; Sat, 24 Dec 2022 02:27:04 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6512:1182:b0:4b5:3cdf:5a65 with SMTP id g2-20020a056512118200b004b53cdf5a65ls123969lfr.2.-pod-prod-gmail; Sat, 24 Dec 2022 02:26:59 -0800 (PST) X-Received: by 2002:a05:6512:2393:b0:4a4:68b9:1a00 with SMTP id c19-20020a056512239300b004a468b91a00mr4200295lfv.40.1671877618981; Sat, 24 Dec 2022 02:26:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671877618; cv=none; d=google.com; s=arc-20160816; b=V89kCBu9p5TYaBy1zpwDz8BIn7N7nKcnirLTZEv32vMuMAqZvkqatTKJT3VfggLCPL k0yuVWSyq12MADpiiwbOajyrI85eItfI14ZxVVn4RrhNK4tgsippZkBBSH6ExLIYx+1d hzMSP2Geb3yYASPqzo3xgv9A9Xnn7SHEULwZVgfLYNnMk0t956srhuZi69/EgaaVZi5m OdBtxRteI2dQQjxb9lsDw2BIADiENQSCkwIF6GbsvF6NLn/Kapcwmh+9CSyupfqBB7Pa tWZ/KzxdQ4Pqs287zmQ/dAvZxq/QOUxyClsIT4wkGh1goGbr3UvFm+wZnwj3b/59NqPR Utgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:to:from:date; bh=nWdABBI5pqjCnTjZQRE+uv02rJAF2+HwHf7H277rGOg=; b=wtl52PHsGasuwx4HPSLXEvUArHiPuUvs35dxFm7AYAJwcj7Gy5g+DbFlkLStL6ZN6L 1uEVrXci/v8jlc19FhHR+RvTnTo3Sm+aCgz5td5NI+tEK+jkPoXpeqPVr+L2By+Hkzqu GNQfYXaQjgBgR8+ugo12tCYkT1B9gZxV+BDiSRKJAtCARtlOfyW/CvY03EQ90X56sXtb n+MSvOea/CIg3gKLGe4uJyJB9O07+r4VEjOUMm2yo/IHXUeNqE8/ppl4WU6Krcgc107F Z6HCCiYDd9JKpjSt/xOtMLMLTZxK0RVbPmFIuQFjWToEtgqQq0GSEeAARLIlWIiInUEe /EFQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 2001:67c:2050:0:465::201 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org Original-Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org. [2001:67c:2050:0:465::201]) by gmr-mx.google.com with ESMTPS id c17-20020ac25f71000000b004b4f4360405si218287lfc.12.2022.12.24.02.26.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Dec 2022 02:26:58 -0800 (PST) Received-SPF: pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 2001:67c:2050:0:465::201 as permitted sender) client-ip=2001:67c:2050:0:465::201; Original-Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:b231:465::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4NfKwc1PWdz9sSd for ; Sat, 24 Dec 2022 11:26:56 +0100 (CET) In-Reply-To: <8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Rspamd-Queue-Id: 4NfKwc1PWdz9sSd X-Original-Sender: albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 2001:67c:2050:0:465::201 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31917 Archived-At: ------M5V69D7X0DHSW7WZFUTYYQ1XDLYQ1H Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable We could do this by passing the full content to the strikeout constructor. = We'd remove, then re-add the checkbox later: plain.content:remove(2) -- remove space plain.content:remove(1) -- remove checkbox plain.content =3D done_marker ..=20 pandoc.Strikeout(plain.content) Am 24. Dezember 2022 09:37:58 MEZ schrieb "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" : >Thanks for the pointers Albert! It did help me get started. Unfortunately= =20 >when I started looping through the Plain object, I realized that the=20 >individual strings were represented as separate elements so there did not= =20 >seem to be an easy way to apply a strikethrough formatting for the entire= =20 >sentence. The best I would be able to do was apply the strikethrough=20 >word-by-word but with that approach, the final HTML did not look very=20 >pleasing. > >In the end, I wound up writing a small Python script that would modify a= =20 >file with the pandoc native format directly (outside of pandoc) and then= =20 >feed the modified native format file back into pandoc. After a couple of= =20 >false starts with the regex and then the native output becoming invalid,= =20 >I've got it working fairly well for my purposes. > >On Thursday, 22 December 2022 at 20:21:19 UTC+8 Albert Krewinkel wrote: > >> "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" writes: >> >> > The specific scenario I'm looking at is a Markdown file such as this: >> > >> > ### Todo >> > - [ ] Foo >> > - [X] Quux Qux >> >> This is an interesting case because it is more complex than it seems. >> The reason is pandoc's `task_list` extension that causes pandoc to >> handle these checkboxes specially, converting them to [Str "=E2=98=90", = Space] >> and [Str "=E2=98=92", Space]. So we'll have to match on that in our filt= er. >> >> A good approach would be to write a filter for Plain, like so: >> >> ``` lua >> function Plain (plain) >> -- modify the object here >> return plain >> end >> ``` >> >> Pandoc will then do all necessary document traversals automatically, >> the function gets applied to all `Plain` elements in the document. >> >> To check for the prefix, we'd do something like >> >> ``` lua >> local done_marker =3D pandoc.List{pandoc.Str '=E2=98=92', pandoc.Space()= } >> local prefix =3D pandoc.List{plain.content[1], plain.content[2]} >> if prefix =3D=3D done_marker then >> -- modify content >> end >> ``` >> >> I hope that's enough to get you started. Happy hacking! >> >> >> --=20 >> Albert Krewinkel >> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 >> > >--=20 >You received this message because you are subscribed to the Google Groups = "pandoc-discuss" group. >To unsubscribe from this group and stop receiving emails from it, send an = email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >To view this discussion on the web visit https://groups.google.com/d/msgid= /pandoc-discuss/8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban%40googlegroups.com. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/C927BB76-A05B-48E2-8277-0DED656D13CA%40zeitkraut.de. ------M5V69D7X0DHSW7WZFUTYYQ1XDLYQ1H Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable We could do this by passing the full content to th= e strikeout constructor. We'd remove, then re-add the checkbox later:
plain.content:remove(2) -- remove space
plain.content:remove(1) -- rem= ove checkbox
plain.content =3D done_marker ..
=C2=A0 pandoc.Strikeou= t(plain.content)


Am 24. Dezember 2022= 09:37:58 MEZ schrieb "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <balaji.dutt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: Thanks for the pointers Albert! It did help me get started. Unfortunately w= hen I started looping through the Plain object, I realized that the individ= ual strings were represented as separate elements so there did not seem to = be an easy way to apply a strikethrough formatting for the entire sentence.= The best I would be able to do was apply the strikethrough word-by-word bu= t with that approach, the final HTML did not look very pleasing.

In the end, I wound up writing a small Python script that would mo= dify a file with the pandoc native format directly (outside of pandoc) and = then feed the modified native format file back into pandoc. After a couple = of false starts with the regex and then the native output becoming invalid,= I've got it working fairly well for my purposes.

On Thursday, 22 Dec= ember 2022 at 20:21:19 UTC+8 Albert Krewinkel wrote:
"balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> The specific scenario I'm looking at is a Markdown file such as th= is:
>
> ### Todo
> - [ ] Foo
> - [X] Quux Qux

This is an interesting case because it is more complex than it seems.
The reason is pandoc's `task_list` extension that causes pandoc to
handle these checkboxes specially, converting them to [Str "=E2=98=90",= Space]
and [Str "=E2=98=92", Space]. So we'll have to match on that in our fil= ter.

A good approach would be to write a filter for Plain, like so:

``` lua
function Plain (plain)
-- modify the object here
return plain
end
```

Pandoc will then do all necessary document traversals automatically,
the function gets applied to all `Plain` elements in the document.

To check for the prefix, we'd do something like

``` lua
local done_marker =3D pandoc.List{pandoc.Str '=E2=98=92', pandoc.Space(= )}
local prefix =3D pandoc.List{plain.content[1], plain.content[2]}
if prefix =3D=3D done_marker then
-- modify content
end
```

I hope that's enough to get you started. Happy hacking!


--=20
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/C927BB76-A05B-48E2-8277-0DED656D13CA%40zeitkraut.de. ------M5V69D7X0DHSW7WZFUTYYQ1XDLYQ1H--