From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/18289 Path: news.gmane.org!.POSTED!not-for-mail From: John MACFARLANE Newsgroups: gmane.text.pandoc Subject: Re: Using Pandoc for general text processing, e.g. writing a Ctags emitter: source text info in the AST Date: Wed, 11 Oct 2017 11:20:09 -0700 Message-ID: <20171011182009.GA42638@protagoras> References: <9c830d97-68ca-4cda-8892-3cad8b2c975d@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed X-Trace: blaine.gmane.org 1507746006 653 195.159.176.226 (11 Oct 2017 18:20:06 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 11 Oct 2017 18:20:06 +0000 (UTC) User-Agent: Mutt/1.9.1 (2017-09-22) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBUGB7HHAKGQEEOO7JUI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Oct 11 20:19:59 2017 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-it0-f62.google.com ([209.85.214.62]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e2Lbl-00079r-Mw for gtp-pandoc-discuss@m.gmane.org; Wed, 11 Oct 2017 20:19:53 +0200 Original-Received: by mail-it0-f62.google.com with SMTP id y15sf1907629ita.22 for ; Wed, 11 Oct 2017 11:20:01 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1507746001; cv=pass; d=google.com; s=arc-20160816; b=ci+bD4lyUHS/IqKwL9uiCekh7Yz0Wuvb55iTEA7W+UI+/jE4GThi+lCU3UKugIZ17N w4GSjxaoQ/l90c3xYM9xNmfX9KN18F7zEUDuS8WOI/2HiNvRfTYAmUmX5uGXhhrUfOHf 3RhuOgUtO2nK71mq7c8nYLV5qfUaN7qoxjiQlNIVFi0s5ScGI2fRh47gI+ukjQgvvk37 6x5y8usFD/nzTkd2oVt12zM+SCxHPdJxLGuAeYRfZ8obm9AzZp9se5mpDqU9CpC2AI2p /KElkV/ShVID61criB+yJofTeB7IkLNVOst0ry7LzqardsOvj8LuuyeXThJQjXLSvdae PMjg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:to :date:from:arc-authentication-results:arc-message-signature:sender :dkim-signature:arc-authentication-results; bh=fe1E4fKUV9ABCRJnVlDfyd3nqQcApJgs8UiB/X5mvh4=; b=Yy9bFheb+D/rQikjf1qVViAoLNiyfbPkFfiglTHgUgW47gxc39rv4CPyuq8akv9XTs Y6PLSordniOI+hrNWU6L+sNBHdFqP8u5oyFln9IcAQMHv1EyBWZa+buEjoa/n1CEr0yx 09rGZfrn8EMiXZEVOqAzc6HKJ8xBxu4qW5hs+nN6BDz/NUhqXWF/MOAeKjsWEclkERbD LdHQ152unnz0r/CwsCKkC6k3jTQkF+vatv99dvQvnI4mct9Tj6Ajh+NJy59pEp4IwdES DwnbHohccwcbAV4jsYc+R+tul0KZ5EQmrP18dRC2s7RL1YS3NTZbwYktouvjqCF4BkDn ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=gxLR970/; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c00::22f as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:date:to:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=fe1E4fKUV9ABCRJnVlDfyd3nqQcApJgs8UiB/X5mvh4=; b=TKP8kvmLERGMyCgIP4+UJFiab2LDjWAIHiWo+B4mMkbvNtYBxJpVg28uJPr1V1EXNL HVa0RH10CqKzwAqiE2nygwPQcAh6NTwZQixw41V6nkCYoU431kl7t8t/jVwSLwXQnUSt U7WwrtQQiLvSN4aCEV9Rf323xv+27w5B2RUV7M2sOCh9ryODZWDK5WoLxEbq+Dy2JClo 1Q5C+paALmuCwe2uiDV/K6SwpmJc275VtasprMOC1v2mA8aS4wTkK6GrGrQm3HBJNipj Qgz3/tCw3RbAzvA2CtxmZzJiBkWA1ON9d7eXXDDCVijsLjCqejnYTHksQiWLJ+Wh//yO VNDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:date:to:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=fe1E4fKUV9ABCRJnVlDfyd3nqQcApJgs8UiB/X5mvh4=; b=kGoe38CiopI7K7IF1k4B44em41YgB+TYUkupkLERGiQa12WiYKRKIcK7rktDtiueR6 ZHDykZS4+HNKcqRjY8mNglyY/106rNlbGwUEprQ8xJFb8xAGWEnTFDIu66VoT7T9FGO5 cDMjmyIOwww8hP1gTXITPlM6ih39pRfXOvWkyXZdsjibYT1Bdn0Dj4Zt15TlpuLUln24 daPYJlQ4zAElm/m1wCGE0BFW7WO572cWeYDZkUz7MUZNRzdTqSebzG9TOWi/TDIjhZE1 xTQIb4YSacOwQf94dq7n66pczI72/+xKF6+Opco1TFOUaJW1Bstqq1iYfaKT/5clrpj8 Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AMCzsaV39rOatFgH5BJpKfm242/IrJBi8aWWl9l0dl30vscI+7/HCPQK L1osmYUizdBOB5lMjxFdQbA= X-Google-Smtp-Source: AOwi7QCjjKaTmw3QhjI6SVwYtEaC05zjzIvsKpcXkdhSyNQmhhLmMQReDLQQjnUej8Tx0pobwV/hIA== X-Received: by 10.36.103.16 with SMTP id u16mr14066itc.3.1507746001071; Wed, 11 Oct 2017 11:20:01 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.159.211.11 with SMTP id bc11ls565243plb.11.gmail; Wed, 11 Oct 2017 11:20:00 -0700 (PDT) X-Received: by 10.98.86.141 with SMTP id h13mr1282362pfj.57.1507746000044; Wed, 11 Oct 2017 11:20:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507746000; cv=none; d=google.com; s=arc-20160816; b=AptEVqvIKDDWfc/PfBfYIgULRXA8qqKe/psIQnbU+nj9+nVzVv68IXCPAE0G5HqI4P jFsgcNUFk0oySGn4T1T30p/qNm2qSDtvfRhpTXYjgoNiHWZmQOmYL+/D8I0Jb29LsWse 6R3GYVUjFsHdRUN2D129TqxMt5rOhmC3N7bSSa/n1Fqj7f+ADOwOj+m5HVGbGRavkvnG +L77QvzpKa9282TJaK+19gsQVRQK+CdFtoleR6jYOP6bAput9VZHSUy7Ty8uvYQbTd3Y QwkGGOt5ax0XeqQL3i8s15P3YhIjFwGPlxTCVrEuZc/CMGj44KweU/uHnGfZUL2ELy36 3UMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:to:date:from:dkim-signature :arc-authentication-results; bh=CC48nbBe8RG0k6VuBWhlSfLjz6E2eDxAe6rrbPV9F0w=; b=R55Vj7xVqIzDbLHb/x97SUeZ8pLqQW4zRGSUMBInSrMDb6jqX8eRDN9lQtBVy8QRYj pxSUhrzlqRKWB+50z0FmkQnxNih6i/Yaevv+cZTJ1bAtCDU0Yc1RLUMjjqab2l70JRvG 9iqgb9zgmyDn7giqYZgVrtbZoKr3nKwY2AqLLptd4/jY3MAdRpASrIAgg/PiWxtreSU1 LmI8TsPWVZxfGtDy9hbMs1Zmdwg0ejpJRQPtHcOJcV/0/QWE748/BqgSOFtJ8xGNDywQ BL16qfPm6pBmRtDagz7ZDkb/1AcnrQ+rIfSXA5rQe0HNllzj0LPMecFb+vyMBZZnXDpC t/zQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=gxLR970/; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c00::22f as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pf0-x22f.google.com (mail-pf0-x22f.google.com. [2607:f8b0:400e:c00::22f]) by gmr-mx.google.com with ESMTPS id w5si1647938pfw.3.2017.10.11.11.20.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Oct 2017 11:20:00 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c00::22f as permitted sender) client-ip=2607:f8b0:400e:c00::22f; Original-Received: by mail-pf0-x22f.google.com with SMTP id z11so1543426pfk.4 for ; Wed, 11 Oct 2017 11:20:00 -0700 (PDT) X-Received: by 10.99.178.77 with SMTP id t13mr371941pgo.219.1507745999263; Wed, 11 Oct 2017 11:19:59 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id 81sm9984262pfh.145.2017.10.11.11.19.57 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 11 Oct 2017 11:19:57 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 23588A1ED; Wed, 11 Oct 2017 14:19:47 -0400 (EDT) Content-Disposition: inline In-Reply-To: <9c830d97-68ca-4cda-8892-3cad8b2c975d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-PGP-Key: http://johnmacfarlane.net/jgm.asc X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=gxLR970/; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c00::22f as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:18289 Archived-At: Unfortunately, I didn't have the forethought to make design choices early on that would have made this easier. There's some discussion at https://github.com/jgm/pandoc/issues/684 about adding attributes to all elements of the AST. But even if this were done, some of the parsers are not designed in a way that would make it easy to track source positions exactly. (For example, a common parsing pattern is to extract the content of a list item or block quote, strip off indentation, and parse it -- but here source positions get lost.) I've recently rewritten the LaTeX reader in a way that allows source positions to be accurately reported. (We now have an initial tokenization phase, and source positions are included in the tokens.) The same methods could be used in the other parsers, but this, like the addition of attributes, would be quite a big change. +++ gw2286-WLbs8XpHrcb2fBVCVOL8/A@public.gmane.org [Oct 11 17 10:35 ]: > I'm interested in using Pandoc to write a generic Ctags emitter. > However, I'm finding this is difficult because it seems impossible to > connect a node to its original source code in the current API(s). > Is there any way to access the line number in the source file where a > node first appears, from the Haskell API or the JSON formatted output? > If not, is this something that would be feasible to track and expose? > Or, to stretch the idea a bit, would it be totally crazy to extend > Pandoc with the ability to attach this kind of metadata to each node, > or perhaps allow a reader to attach *arbitrary* metadata? In the latter > case, the structure of the node-level metadata would be a matter of > convention, and writers would be free to simply ignore it. > I'm asking about both the feasbility of implementation ("would it be > possible without having to rewrite huge amounts of code?") and the > desirability of implementation ("is this something the Pandoc project > is interested in?"). > For what it's worth, I don't envision this being useful solely for > Ctags tag generation, although IMO a "format-agnostic" tag generator > for dozens of markup formats that comes "for free" out of a single > implementation seems like a good enough prize on its own, so long as > you can easily add, say, "linenrStart", "linenrEnd", and/or > "verbatimSource" attributes to the reader. You could also use this > ability to create Pandoc-based code/text formatters and linters that > introspect on the contents of nodes, or automatically inject a "view > source for this section on GitHub" link into top-level headings. > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to > [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > [3]https://groups.google.com/d/msgid/pandoc-discuss/9c830d97-68ca-4cda- > 8892-3cad8b2c975d%40googlegroups.com. > For more options, visit [4]https://groups.google.com/d/optout. > >References > > 1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > 2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > 3. https://groups.google.com/d/msgid/pandoc-discuss/9c830d97-68ca-4cda-8892-3cad8b2c975d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer > 4. https://groups.google.com/d/optout