From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31189 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Martin Post Newsgroups: gmane.text.pandoc Subject: =?UTF-8?Q?Re:_HTML_=E2=80=93_Building_a_linked_index_page?= Date: Sun, 7 Aug 2022 08:37:58 -0700 (PDT) Message-ID: References: <45b1415d-1ab3-44c2-8199-d45095873e62n@googlegroups.com> <87edxs401h.fsf@zeitkraut.de> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1536_2112011596.1659886678874" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35273"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCHYFZ6CWYNBBWFYX6LQMGQE4PJKC6I-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sun Aug 07 17:38:04 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-yw1-f190.google.com ([209.85.128.190]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oKiLr-00092A-VE for gtp-pandoc-discuss@m.gmane-mx.org; Sun, 07 Aug 2022 17:38:03 +0200 Original-Received: by mail-yw1-f190.google.com with SMTP id 00721157ae682-329aa33ca23sf7863577b3.13 for ; Sun, 07 Aug 2022 08:38:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc; bh=NgAPxxj0CBOLuhEZsoS9r4LugZ/mEnsf9w8wWeABlhY=; b=GICmeGn3g1FS8MXvdny1QvGHEZbxhwdGxeEk/ObN+SLR+vC/z/6qIk42HYjiI2IfNu imlPCj0U1k21zI7e47Vv/+5TLdWO9g2Xww8SNzkX2wDxhiqRTxpkao+vK9BTMQTypK4A JUDbExzHfvZjSsjaov3ZsfLcN05grOESDqPY+/UlmxwwuuQhCJIr7/9M9L+NFzDqMz0b rE5DxartKWp7jZTWMTJCF0YJ8pfG/o/OGPdyLmJQT1qRWGUx8PJBIud8OAr14FMuKA3a vzoaQpWVGhKIdAx1iOC2ScAQAXRLNCBsc6sJVTElLl6gjbx48m7s/rmVpTmrjCfvV9Zv QbQQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc; bh=NgAPxxj0CBOLuhEZsoS9r4LugZ/mEnsf9w8wWeABlhY=; b=p0F7rX9NJFASPklxFj5mnU2o3X69MAJlu+P9A1AFid3AKfIsU5KtbGyJxk/fgSJZ5P UsRuY/JB7GkzdCiFBjV31Z73s7HnaOuUuTnVGf8EejIVTOdb+u2NPW8wa6wXnQVMeGtR 4vhCdkf7J5VGcP5NOIAID1gagU2qoF0+OMWE64UyrVbaRcTJYiOUsmML++Dl9+8psQ6O WdA6kBU72e6h1JekDOW3h3yq73u/oeiX+jlk+2mk6PtuH0Wai1wt0FDOV5+znEMClLFO fcBWEMYonO96V+n1eUItddr0OrJPmOR+IqsrtT8rpIp+fJzyQ17G7HTit78AGxmCmxB4 U9Sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:sender:from:to:cc; bh=NgAPxxj0CBOLuhEZsoS9r4LugZ/mEnsf9w8wWeABlhY=; b=ubhq1u2IpvOwaIia7Ngw1E4UMNUn+BLdiEcFmrY5HVaSAcBCFM6cBZTLd/voYGVFQz OGuRWYKM0QZUTejFVBVJDuE83mADLysaHpy9hSd/1wg3X/bOmshXdmkRJSl2XB31XQaI UdOcj/PZtcVzhowKxgsC236gzmcifQuz8WuWnXxcHjAEfhdJZNjaifg3EVD8efB9MQSm sBbzh+fzwuEg1QqHDHEPI5OWHcArpHGITiTUlwJ7BFIEzXS55sdIgGHUfrzDj/xpc2eI yjsnzGMdlVKUlQ2vsqDiwQ8OCZ4brJ/ZmHHnYVznUEYFyrDetcXVdIPvYdQ9OLpa5vav IfeQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACgBeo18ysb72zj7sQI4YT89CcDtZ+/+b4Bb074G+XVyzOHEkx5b2smn 7vFIAduhADIA+KoTCEE+k5A= X-Google-Smtp-Source: AA6agR6JOdFjOLTHLmri55F7ZD/rkWkBHLdP6HDJbBsv6CBD4QdPw7vTku2BIXfx4vW51CTwuyjLGA== X-Received: by 2002:a5b:804:0:b0:670:7de2:5fde with SMTP id x4-20020a5b0804000000b006707de25fdemr11945468ybp.43.1659886682920; Sun, 07 Aug 2022 08:38:02 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:690c:293:b0:31c:8442:94bf with SMTP id bf19-20020a05690c029300b0031c844294bfls4356228ywb.6.-pod-prod-gmail; Sun, 07 Aug 2022 08:37:59 -0700 (PDT) X-Received: by 2002:a81:ae0a:0:b0:324:59ab:feec with SMTP id m10-20020a81ae0a000000b0032459abfeecmr15028252ywh.7.1659886679513; Sun, 07 Aug 2022 08:37:59 -0700 (PDT) In-Reply-To: <87edxs401h.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> X-Original-Sender: MartinPostBerlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31189 Archived-At: ------=_Part_1536_2112011596.1659886678874 Content-Type: multipart/alternative; boundary="----=_Part_1537_2035313740.1659886678874" ------=_Part_1537_2035313740.1659886678874 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you, Albert. That approach looks promising. As I'm not a developer, I= =20 cannot fill the blanks here, but I hope someone else can. It would be great= =20 if such a reader could be used to create index files for whatever one might= =20 look for in a large document set: =E2=80=9Cmeta=E2=80=9D docs (aka sitemaps= ), table / image=20 / definition lists=E2=80=A6 On Sunday, August 7, 2022 at 4:12:07 PM UTC+2 Albert Krewinkel wrote: > > Can someone suggest how to approach this using only Pandoc (filters)=20 > > instead of using a separate tool?=20 > > The information of which header originated in which file is lost during= =20 > the conversion, so we can't use a filter. However, we can use a custom=20 > reader, as those have access to both the filenames and the file's=20 > contents. We just need to do the parsing ourselves with `pandoc.read`:=20 > > ``` lua=20 > function Reader (sources, opts)=20 > local items =3D pandoc.List{}=20 > for i, source in ipairs(sources) do=20 > local headers =3D pandoc.read(source, 'markdown', opts).blocks:walk{=20 > Block =3D function (blk)=20 > return blk.t =3D=3D 'Header'=20 > and blk -- keep Header elements=20 > or {} -- discard everything else=20 > end=20 > }=20 > local current_filename =3D source.name=20 > -- TODO: convert headers to list items, add links, append to `items`=20 > -- ...=20 > end=20 > return pandoc.Pandoc{pandoc.BulletList(items)}=20 > end=20 > ```=20 > > This needs some modification, of course, but I hope the general idea=20 > becomes clear from this.=20 > > HTH=20 > > --=20 > Albert Krewinkel=20 > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124=20 > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/a68fad6e-6a63-49be-a19a-575919b91a38n%40googlegroups.com. ------=_Part_1537_2035313740.1659886678874 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you, Albert. That approach looks promising. As I'm not a develope= r, I cannot fill the blanks here, but I hope someone else can. It would be = great if such a reader could be used to create index files for whatever one= might look for in a large document set: =E2=80=9Cmeta=E2=80=9D docs (aka s= itemaps), table / image / definition lists=E2=80=A6


On Sunday, Augu= st 7, 2022 at 4:12:07 PM UTC+2 Albert Krewinkel wrote:
> Can someone suggest how to ap= proach this using only Pandoc (filters)
> instead of using a separate tool?

The information of which header originated in which file is lost during
the conversion, so we can't use a filter. However, we can use a custom
reader, as those have access to both the filenames and the file's
contents. We just need to do the parsing ourselves with `pandoc.read`:

``` lua
function Reader (sources, opts)
local items =3D pandoc.List{}
for i, source in ipairs(sources) do
local headers =3D pandoc.read(source, 'markdown', opts).blocks:walk= {
Block =3D function (blk)
return blk.t =3D=3D 'Header'
and blk -- keep Header elements
or {} -- discard everything else
end
}
local current_filename =3D source.name
-- TODO: convert headers to list items, add links, append to `items= `
-- ...
end
return pandoc.Pandoc{pandoc.BulletList(items)}
end
```

This needs some modification, of course, but I hope the general idea
becomes clear from this.

HTH

--=20
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/a68fad6e-6a63-49be-a19a-575919b91a38n%40googlegroups.= com.
------=_Part_1537_2035313740.1659886678874-- ------=_Part_1536_2112011596.1659886678874--