public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: RFC: compiling lpeg into pandoc
Date: Tue, 2 Nov 2021 00:14:20 +0100	[thread overview]
Message-ID: <CADAJKhC+Egzr_LAD1BHNYv_ne0Lv3NLXUu1Pftrg9H1pHwXK8w@mail.gmail.com> (raw)
In-Reply-To: <87ee806loe.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 4397 bytes --]

I would like it, especially if it could be used from inside Lua filters.
One could then embed snippets of a DSL in code blocks/spans with some
identifying attribute or raw blocks/spans with an identifying pseudo-format
or inside attribute values, parse the DSL inside the Lua filter and take
different paths based on the result.

For example my string interpolation DSL which I currently use a very slow
and very verbose parser written in pure Lua (well actually MoonScript :-)
where everything is enclosed in various brackets because balanced brackets
are easy to match in Lua patterns with the `%b()` construct. Since Lua
patterns don't support alternations or quantified groups I have to simulate
alternations with huge arrays of maps which specify patterns to try in
succession, along with the labels for the captures.

An example of an alternative in the table with possible interpolation forms:

``````moon
  [40]: {
    labels: {
      [1]: "path"
      [2]: "truth"
      [3]: "then_expr"
      [4]: "else_expr"
    }
    name: "path_if_non_empty_then_else"
    pat: "^%$%(%s*(%b<>)%s*(%&)%?%s*(%$?%b())%s*%&%!%s*(%$?%b())%s*%)$"
    strip: {
      else_expr: "()"
      path: "<>"
      then_expr: "()"
    }
  }
``````

This matches something which when embedded in Markdown looks something like
this:

``````markdown
`$( <var/foo> &? $(<var/bar>) &! $(<var/baz>) )`{.sic}
``````

where `<var/foo>` etc. each references a value in a mapping of `key:
string` pairs obtained from the metadata, and the whole inserts the value
of one of two such variables into the document depending on whether a third
variable is non-empty or not (basically a ternary).

The table with such alternatives is built programmatically in nested loops
over lists of subpatterns, so being able to use lpeg (probably even the re
module[^1] since I have already written a Perl implementation in Pegex[^2])
would both speed things up and reduce the volume of the code, so it would
be most welcome!

[^1]: http://www.inf.puc-rio.br/~roberto/lpeg/re.html
[^2]: https://metacpan.org/dist/Pegex/view/lib/Pegex/Syntax.pod



Den mån 1 nov. 2021 14:39Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
skrev:

> Hi all,
>
> I'd like to request views and opinions on the inclusion of a Lua library
> into pandoc: the LPeg library is a small parsing library written in C to
> be used with Lua. http://www.inf.puc-rio.br/~roberto/lpeg/
>
> The motivation is that this library could be used to extend pandoc to
> deal with new formats. A good example is jgm's "lunamark", which could
> be modified to support custom syntax extensions that might otherwise be
> difficult to implement with current filters.
> https://github.com/jgm/lunamark/blob/master/lunamark/reader/markdown.lua
>
> On the other hand, this is yet another library and could be considered
> bloat. Most users, esp. on Mac and Linux, should not find it too
> difficult to install the library on their system should they need it.
>
> The question is therefore if you think whether including lpeg into
> pandoc would make is worth it; e.g., do you believe you'd write such
> LPeg-based parsers? Would you share them with people who wouldn't
> immediately know how to install the library themselves?
>
> Thanks in advance,
> Albert
>
>
> PS: The corresponding pull request is
>     https://github.com/jgm/pandoc/pull/7649
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/87ee806loe.fsf%40zeitkraut.de
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhC%2BEgzr_LAD1BHNYv_ne0Lv3NLXUu1Pftrg9H1pHwXK8w%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 6803 bytes --]

  parent reply	other threads:[~2021-11-01 23:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01 13:38 Albert Krewinkel
     [not found] ` <87ee806loe.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-11-01 22:06   ` AW: " denis.maier-NSENcxR/0n0
     [not found]     ` <83dbc2c6dcbe49e49e00dac40b73eb65-NSENcxR/0n0@public.gmane.org>
2021-11-02 11:44       ` denis.maier-NSENcxR/0n0
     [not found]         ` <ec915ee445b6471fb1f252d98a727e3d-NSENcxR/0n0@public.gmane.org>
2021-11-02 11:52           ` Albert Krewinkel
     [not found]             ` <878ry67p0o.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-11-02 12:38               ` BPJ
2021-11-02 13:28               ` AW: " denis.maier-NSENcxR/0n0
     [not found]                 ` <a4454aeb346b404ba248ee962bacd7cd-NSENcxR/0n0@public.gmane.org>
2021-11-02 13:37                   ` BPJ
     [not found]                     ` <CADAJKhD=XdoUF6cg06uNqdGRXW1tdOzidv82=m84n9gSHR-LjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-11-03  9:41                       ` krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2021-11-01 23:14   ` BPJ [this message]
2021-11-08 15:38   ` RFC: " Albert Krewinkel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADAJKhC+Egzr_LAD1BHNYv_ne0Lv3NLXUu1Pftrg9H1pHwXK8w@mail.gmail.com \
    --to=melroch-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).