From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31916 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" Newsgroups: gmane.text.pandoc Subject: Re: Pointers on modifying Plain objects(?) Date: Sat, 24 Dec 2022 00:37:58 -0800 (PST) Message-ID: <8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban@googlegroups.com> References: <8af6876b-72cc-448e-9f5e-7d12ccdf2ad8n@googlegroups.com> <878riz8wf4.fsf@zeitkraut.de> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_18642_1064621663.1671871078866" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28303"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDV6VCFWSEDRB2HUTKOQMGQETY5OGOI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Dec 24 09:38:05 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oa1-f62.google.com ([209.85.160.62]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1p902e-0007Bm-Bw for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 24 Dec 2022 09:38:04 +0100 Original-Received: by mail-oa1-f62.google.com with SMTP id 586e51a60fabf-14c6a6ff8d8sf3312671fac.1 for ; Sat, 24 Dec 2022 00:38:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=8OJgFPEoJSJ69ZQVvgodRnoSFZ6zqquX3B9/i3mbKW8=; b=nVlJ1oP67Owkzd5hebUXJRnzYfliVqrBtcITLDP2BNJQNurM/ThqiUd5Cn9BBrcLo8 KZfHanYadaJ3bajMm3YI/PfvOMoSWu37ADMOMyBQA2FY0gGPoDZfKpAokdP7Oz6dF9jG UPf6GcW7GmQkMkKxPpEFZER2ei88MRxBtP5WRLP+uSCS8Pp3mgCOdVEYIncpDTM43lyn efF0N9Tl9wgVYjP/8Mp0nQanxfOVFPvQp3d79e/4KSAXWLouk3xbJ7exg5QQeC3Cb6qa qcbYaesKtVGG/SnEoVw9YbhrHaxFwv4tie3y/lOYZocjyORVq/oBPeb+OnEOcv1yFayR eABg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=8OJgFPEoJSJ69ZQVvgodRnoSFZ6zqquX3B9/i3mbKW8=; b=quLZTJ3KRBu4Amx5HRFvm1TG9tOZrhRxFqRuYcU7F2MNjq9SEQu48vRquYmnY73wBV DPKf/K1bg2W3wqRsF1f9P1iA8OAssyC1kb8JWsfiJMa3lK6qtrOI3ZVhZNmLVSPR+jQ3 1dHW8ISYnDBGjTNxmXFWKCx+WxWyCrRBCeWuFs4eN29WcKlDxNDuyuRiVplP20xCWSh9 xCmAIcDFJMbpKozoPv9FFZgchiP20UK/r/RG4dfjEddpxDFMa12Y2WRxP3Nz/kq976vi GLnL0nGvxzpac5ytGQ3YHiESPRWij/xW1TxglycbvizodQQCJtbM9sLeZrCg8mt2OwKe dnRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:sender:from:to:cc :subject:date:message-id:reply-to; bh=8OJgFPEoJSJ69ZQVvgodRnoSFZ6zqquX3B9/i3mbKW8=; b=43ukf9vDcVM5FNXGIL2snWstMVBJuWZf+2kVJsJvNpYeVUpO+BZ/seXTn7IAuxd/h8 SmB7xKCZf76qnLO3g17GTLhCwClk5FZlRN2yKBz0WN8C7ClC9T46mNtKDDGGTHXM16lc lWYWLl50IsizUQ3b07sEDZQfrGD5ANC+GcW8NITZKcqS+iowC4qco6pNVP2v/9FPK7Q4 WP13TDvsjDI8i3UeAjozT/6W517fPntuZmvbx1INxGpO+y3+ET4q8S6gitvYHxJRDl/K 13jK34JBBgvilOzkuiwJA2X7bDi904g4D4gnJ2rvc7XoO1hZU9x3USapsdER8VoQEFPe pgiQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AFqh2kpEYkzoSyneQsuvYNHalNu41xMcIW5SxJTjTPwSBPUR1yP0sPeh j9RaqSLHbyR86SK1gCMmNuc= X-Google-Smtp-Source: AMrXdXvTO9m4u1KdP/KVK7Zf6dK/bcG9jxfDWz5L4219ixYcJ7SwzWB052KAWqS/zhlfnFTq/qaaGA== X-Received: by 2002:a05:6870:7a0c:b0:148:78d5:5488 with SMTP id hf12-20020a0568707a0c00b0014878d55488mr799326oab.235.1671871083240; Sat, 24 Dec 2022 00:38:03 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aca:6c1:0:b0:359:ca69:f473 with SMTP id 184-20020aca06c1000000b00359ca69f473ls1851377oig.10.-pod-prod-gmail; Sat, 24 Dec 2022 00:37:59 -0800 (PST) X-Received: by 2002:a05:6808:228b:b0:35c:29f8:e83a with SMTP id bo11-20020a056808228b00b0035c29f8e83amr713846oib.25.1671871079668; Sat, 24 Dec 2022 00:37:59 -0800 (PST) In-Reply-To: <878riz8wf4.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> X-Original-Sender: balaji.dutt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31916 Archived-At: ------=_Part_18642_1064621663.1671871078866 Content-Type: multipart/alternative; boundary="----=_Part_18643_1535419770.1671871078866" ------=_Part_18643_1535419770.1671871078866 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the pointers Albert! It did help me get started. Unfortunately= =20 when I started looping through the Plain object, I realized that the=20 individual strings were represented as separate elements so there did not= =20 seem to be an easy way to apply a strikethrough formatting for the entire= =20 sentence. The best I would be able to do was apply the strikethrough=20 word-by-word but with that approach, the final HTML did not look very=20 pleasing. In the end, I wound up writing a small Python script that would modify a=20 file with the pandoc native format directly (outside of pandoc) and then=20 feed the modified native format file back into pandoc. After a couple of=20 false starts with the regex and then the native output becoming invalid,=20 I've got it working fairly well for my purposes. On Thursday, 22 December 2022 at 20:21:19 UTC+8 Albert Krewinkel wrote: > "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" writes: > > > The specific scenario I'm looking at is a Markdown file such as this: > > > > ### Todo > > - [ ] Foo > > - [X] Quux Qux > > This is an interesting case because it is more complex than it seems. > The reason is pandoc's `task_list` extension that causes pandoc to > handle these checkboxes specially, converting them to [Str "=E2=98=90", S= pace] > and [Str "=E2=98=92", Space]. So we'll have to match on that in our filte= r. > > A good approach would be to write a filter for Plain, like so: > > ``` lua > function Plain (plain) > -- modify the object here > return plain > end > ``` > > Pandoc will then do all necessary document traversals automatically, > the function gets applied to all `Plain` elements in the document. > > To check for the prefix, we'd do something like > > ``` lua > local done_marker =3D pandoc.List{pandoc.Str '=E2=98=92', pandoc.Space()} > local prefix =3D pandoc.List{plain.content[1], plain.content[2]} > if prefix =3D=3D done_marker then > -- modify content > end > ``` > > I hope that's enough to get you started. Happy hacking! > > > --=20 > Albert Krewinkel > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban%40googlegroups.com. ------=_Part_18643_1535419770.1671871078866 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the pointers Albert! It did help me get started. Unfortunately w= hen I started looping through the Plain object, I realized that the individ= ual strings were represented as separate elements so there did not seem to = be an easy way to apply a strikethrough formatting for the entire sentence.= The best I would be able to do was apply the strikethrough word-by-word bu= t with that approach, the final HTML did not look very pleasing.

In the end, I wound up writing a small Python script that would mo= dify a file with the pandoc native format directly (outside of pandoc) and = then feed the modified native format file back into pandoc. After a couple = of false starts with the regex and then the native output becoming invalid,= I've got it working fairly well for my purposes.

On Thursday, 22 Dec= ember 2022 at 20:21:19 UTC+8 Albert Krewinkel wrote:
"balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> The specific scenario I'm looking at is a Markdown file such a= s this:
>
> ### Todo
> - [ ] Foo
> - [X] Quux Qux

This is an interesting case because it is more complex than it seems.
The reason is pandoc's `task_list` extension that causes pandoc to
handle these checkboxes specially, converting them to [Str "=E2=98= =90", Space]
and [Str "=E2=98=92", Space]. So we'll have to match on t= hat in our filter.

A good approach would be to write a filter for Plain, like so:

``` lua
function Plain (plain)
-- modify the object here
return plain
end
```

Pandoc will then do all necessary document traversals automatically,
the function gets applied to all `Plain` elements in the document.

To check for the prefix, we'd do something like

``` lua
local done_marker =3D pandoc.List{pandoc.Str '=E2=98=92', pando= c.Space()}
local prefix =3D pandoc.List{plain.content[1], plain.content[2]}
if prefix =3D=3D done_marker then
-- modify content
end
```

I hope that's enough to get you started. Happy hacking!


--=20
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban%40googlegroups.= com.
------=_Part_18643_1535419770.1671871078866-- ------=_Part_18642_1064621663.1671871078866--