public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Experimental citeproc implementation in Rust
Date: Wed, 12 Dec 2018 01:21:51 -0800 (PST)	[thread overview]
Message-ID: <9e7db31a-8244-4ac8-800b-25709cedc240@googlegroups.com> (raw)
In-Reply-To: <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 4874 bytes --]

 

That’s a good point about native lua modules. I’m looking into a safe API 
for that over at rlua <https://github.com/kyren/rlua/issues/105>, but it’s 
clearly possible in unsafe Rust.


The output formatting architecture so far doesn’t actually use any 
particular internal format, it’s just a trait (like a Haskell typeclass) 
with an associated type. So PlainText builds Strings and ignores formatting 
(and is very fast), but Pandoc builds Vec<Inline>, where Inline comes from 
the pandoc_types crate. So an unsupported CSL formatting instruction like 
display="block" would be simply ignored in the Pandoc implementation. A 
fully-featured format would encode everything in an Html type that knows 
how to serialize itself at the end. I might change this, given its code 
size implications for the WebAssembly output, as Rust's monomorphisation 
means all formatting-dependent functions would be compiled and emitted 
three times with inlining performed on each.


In the current architecture, you also have *inputs* that are generic over 
the output format. So a Cite is actually specialised for each input format, 
such that the locators and affixes are specialised and any deserialization 
would be to a Cite<Pandoc>, which will read Pandoc::Build = Vec<Inline> 
into its affixes. This is through the serde_json::Deserialize trait, which 
is pretty dead easy, it just boils down to keeping it in sync. That could 
be mitigated by doing an incomplete deserialization, and leaving 
unrecognised nodes in serialized form, such that new AST nodes wouldn't 
cause parse errors. But that's probably more work than maintenance in the 
first place.


The BibTex parsing is a tricky one, though. There’s this 
<https://github.com/charlesvdv/nom-bibtex> for the main syntax, at least. I 
wouldn’t want to fork out to Pandoc for every single latex text field, but 
maybe the Lua API’s read would help here. It might be simpler to support 
both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a 
simple Rust-based parser, but not at the same time. What do people use 
backslash commands for in BibTeX? Are there names and document titles out 
there that really need the whole power of LaTeX to render? I might have to 
think about this some more. Perhaps a successor to CSL-JSON that accepts 
arbitrary JSON objects wherever the old one accepts strings.


On Wednesday, December 12, 2018 at 5:44:35 AM UTC+11, John MacFarlane wrote:
>
>
> That's an interesting idea.  pandoc-citeproc is still 
> pretty crufty, and it doesn't always behave like 
> citeproc-js, so I can see the point of this. 
>
> The difficulties are that 
>
> - pandoc-citeproc is currently quite tightly 
>   integrated with pandoc; it operates on the pandoc 
>   AST.  So as you note, that capability would have to 
>   be reproduced somehow in citeproc-rs.  I think that 
>   the tree-walking work could be given to a lua filter 
>   that either called out to citeproc-rs or linked to 
>   a version of it.  (I don't think luajit is required 
>   for this; one can write lua modules in C, so it 
>   should be possible to do it in rust.) But citeproc-rs would 
>   still have to be able to handle pandoc JSON. Perhaps 
>   that could just be the underlying format it operates 
>   on (it would have to replace the current HTML-ish 
>   syntax used in citeproc-js, and maybe it would have 
>   to be made more expressive). 
>
> - One potential problem is that citeproc-rs would need to 
>   change, sometimes, when pandoc does.  Currently 
>   that's not a problem since I maintain pandoc-citeproc. 
>
> - pandoc-citeproc does some things citeproc-js does 
>   not do (these are, strictly speaking, extensions to 
>   standard citeproc).  For example, author-in-text 
>   citations, citation prefixes and suffixes, proper 
>   handling of math (that's actually just folded into 
>   general pandoc support), movement of punctuation, 
>   conversion from bibtex/biblatex and other formats. 
>   Note that conversion from bibtex relies on pandoc's 
>   latex parser; to reproduce this functionality, you'd 
>   have to write a latex parser in rust or somehow call 
>   out to pandoc. 
>
> Best, 
> John 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9e7db31a-8244-4ac8-800b-25709cedc240%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5786 bytes --]

  parent reply	other threads:[~2018-12-12  9:21 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-11 17:04 Cormac Relf
     [not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-11 18:44   ` John MacFarlane
     [not found]     ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12  9:21       ` Cormac Relf [this message]
     [not found]         ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-12 20:38           ` John MacFarlane
     [not found]             ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12 21:07               ` Paulo Ney de Souza
     [not found]                 ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-12-13  4:02                   ` Cormac Relf
     [not found]                     ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-17  4:37                       ` Cormac Relf
     [not found]                         ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 12:47                           ` Cormac Relf
     [not found]                             ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 17:47                               ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e7db31a-8244-4ac8-800b-25709cedc240@googlegroups.com \
    --to=web-v7sng7lnsvbsqp/k+iv0sw@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).