public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Experimental citeproc implementation in Rust
Date: Tue, 11 Dec 2018 09:04:11 -0800 (PST)	[thread overview]
Message-ID: <78b7f42d-7640-45ff-a359-f59355217af8@googlegroups.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 5177 bytes --]

Hi,

I've been working on https://github.com/cormacrelf/citeproc-rs, an 
experimental new CSL and CSL-M citation processor written in Rust. The one 
tracking issue gives a rough overview of how early this is in development. 
t can't do name blocks yet, let alone disambiguation or structured 
bibliographies, but there are promising foundations. The coolest feature so 
far is the error reporting at parse time. Try running it on a style with 
errors like <number variable="issued" />. IContributions or support would 
be welcome.

I'm raising it here because there's an interesting possibility that could 
come out of it, that touches the Pandoc. platform.

   - It could *replace citeproc-js* by compiling to WebAssembly that would 
   run in Zotero, browsers and Node.
      - This is one good reason to use Rust, which has excellent WASM 
      tooling. I have nothing against Haskell or working on pandoc-citeproc 
      directly, but Haskell WASM support is just not there yet.
      - It could *feasibly also replace pandoc-citeproc*, and in fact can 
   already build some pandoc JSON output.
   - It could feasibly *also* replace almost *every other citeproc* by 
   exposing a native static library on every target the Rust/LLVM ecosystem 
   supports. That could be wrapped in e.g. PHP, Ruby, Python, and Java, which 
   all have FFI support. It's weird to me that nobody has built a 
   lingua-franca native library yet, given how complex the specification is. 
   It's a similar situation to libxml2 or libgit2: big, complex, but 
   solve-once-use-everywhere.
   
That's one ring to rule them all, all in a single codebase, fewer competing 
implementations, more uniform output across CSL tools and less work for the 
community on both bugfixing and CSL evolution. There are also long-standing 
bugs in pandoc-citeproc and citeproc-js that I'm aiming to fix in the 
process, alongside some reworking of the less-complete or less-thought-out 
extended features like citeproc-js' abbreviations or the fairly hacky and 
rigid author suppression in both pandoc and citeproc-js.

The second point on that evil plan, replacing pandoc-citeproc, is a bit 
tricky, and might need a bit of thinking through, given that: 

   - Using FFI from a Haskell pandoc-citeproc that handles the Pandoc parts 
   is a bit... I don't know.
      - Imagine: pandoc-citeproc deserializes a big JSON document, walks 
      it, parses [@doe, 31] syntax, collects a bunch of cites (with cite IDs 
      attached) and then FFIs out the rest of the job, attaching pandoc JSON to 
      the relevant points at the other end. There would be quite a lot of weird 
      conversions and serialization in this, because Text.Pandoc.Definition 
      doesn't and shouldn't provide a C ABI-compatible memory layout, but it 
      might work. 
   - You could replace the entire pandoc-citeproc JSON filter with a new 
   binary, but the Lua API exists for a reason. Maybe if there's a bunch of 
   work going on, avoiding double-JSON should be one of the goals. Is that 
   something that should be written with a Lua FFI wrapper around citeproc-rs 
   (i.e. the libciteproc static library it builds)? Setting aside the tricky 
   problems with how to return owned datastructures over FFI without leaking 
   memory, FFI is only available with LuaJIT, which as I understand it would 
   have to become a system dependency for Pandoc through an hslua constraint 
   that has not been specified in official Pandoc builds so far. In the 
   alternative, it wouldn't be too hard to maintain a JSON filter for 
   non-LuaJIT installs, but it sure would be confusing for users to have two 
   ways for different platforms or configurations. Maybe JSON is good enough, 
   and maybe serde_json is so fast it won't matter in the end. It would 
   certainly be much simpler.
   - pandoc-citeproc includes syntax parsing that kinda defines part of 
   Pandoc Markdown (i.e. [@doe, 33]), so that would be moving further out of 
   tree than it already is. There is a good parser combinator library, at 
   least (nom), that could replicate the Parsec code in a way that's fairly 
   comprehensible by Haskell developers. Some of the more advanced 
   display/formatting features of CSL also need support from Pandoc output 
   templates to work correctly. Are we okay with all of that?
   
If anyone has any input on these interop problems, I'd love to hear it. At 
the moment, it looks like the way forward is to replace the pandoc-citeproc 
binary wholesale, speaking JSON and taking on all the pandoc-specific 
features in Rust.

Cormac

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78b7f42d-7640-45ff-a359-f59355217af8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5820 bytes --]

             reply	other threads:[~2018-12-11 17:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-11 17:04 Cormac Relf [this message]
     [not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-11 18:44   ` John MacFarlane
     [not found]     ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12  9:21       ` Cormac Relf
     [not found]         ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-12 20:38           ` John MacFarlane
     [not found]             ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12 21:07               ` Paulo Ney de Souza
     [not found]                 ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-12-13  4:02                   ` Cormac Relf
     [not found]                     ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-17  4:37                       ` Cormac Relf
     [not found]                         ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 12:47                           ` Cormac Relf
     [not found]                             ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 17:47                               ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78b7f42d-7640-45ff-a359-f59355217af8@googlegroups.com \
    --to=web-v7sng7lnsvbsqp/k+iv0sw@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).