From: Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Experimental citeproc implementation in Rust
Date: Tue, 11 Dec 2018 09:04:11 -0800 (PST) [thread overview]
Message-ID: <78b7f42d-7640-45ff-a359-f59355217af8@googlegroups.com> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 5177 bytes --]
Hi,
I've been working on https://github.com/cormacrelf/citeproc-rs, an
experimental new CSL and CSL-M citation processor written in Rust. The one
tracking issue gives a rough overview of how early this is in development.
t can't do name blocks yet, let alone disambiguation or structured
bibliographies, but there are promising foundations. The coolest feature so
far is the error reporting at parse time. Try running it on a style with
errors like <number variable="issued" />. IContributions or support would
be welcome.
I'm raising it here because there's an interesting possibility that could
come out of it, that touches the Pandoc. platform.
- It could *replace citeproc-js* by compiling to WebAssembly that would
run in Zotero, browsers and Node.
- This is one good reason to use Rust, which has excellent WASM
tooling. I have nothing against Haskell or working on pandoc-citeproc
directly, but Haskell WASM support is just not there yet.
- It could *feasibly also replace pandoc-citeproc*, and in fact can
already build some pandoc JSON output.
- It could feasibly *also* replace almost *every other citeproc* by
exposing a native static library on every target the Rust/LLVM ecosystem
supports. That could be wrapped in e.g. PHP, Ruby, Python, and Java, which
all have FFI support. It's weird to me that nobody has built a
lingua-franca native library yet, given how complex the specification is.
It's a similar situation to libxml2 or libgit2: big, complex, but
solve-once-use-everywhere.
That's one ring to rule them all, all in a single codebase, fewer competing
implementations, more uniform output across CSL tools and less work for the
community on both bugfixing and CSL evolution. There are also long-standing
bugs in pandoc-citeproc and citeproc-js that I'm aiming to fix in the
process, alongside some reworking of the less-complete or less-thought-out
extended features like citeproc-js' abbreviations or the fairly hacky and
rigid author suppression in both pandoc and citeproc-js.
The second point on that evil plan, replacing pandoc-citeproc, is a bit
tricky, and might need a bit of thinking through, given that:
- Using FFI from a Haskell pandoc-citeproc that handles the Pandoc parts
is a bit... I don't know.
- Imagine: pandoc-citeproc deserializes a big JSON document, walks
it, parses [@doe, 31] syntax, collects a bunch of cites (with cite IDs
attached) and then FFIs out the rest of the job, attaching pandoc JSON to
the relevant points at the other end. There would be quite a lot of weird
conversions and serialization in this, because Text.Pandoc.Definition
doesn't and shouldn't provide a C ABI-compatible memory layout, but it
might work.
- You could replace the entire pandoc-citeproc JSON filter with a new
binary, but the Lua API exists for a reason. Maybe if there's a bunch of
work going on, avoiding double-JSON should be one of the goals. Is that
something that should be written with a Lua FFI wrapper around citeproc-rs
(i.e. the libciteproc static library it builds)? Setting aside the tricky
problems with how to return owned datastructures over FFI without leaking
memory, FFI is only available with LuaJIT, which as I understand it would
have to become a system dependency for Pandoc through an hslua constraint
that has not been specified in official Pandoc builds so far. In the
alternative, it wouldn't be too hard to maintain a JSON filter for
non-LuaJIT installs, but it sure would be confusing for users to have two
ways for different platforms or configurations. Maybe JSON is good enough,
and maybe serde_json is so fast it won't matter in the end. It would
certainly be much simpler.
- pandoc-citeproc includes syntax parsing that kinda defines part of
Pandoc Markdown (i.e. [@doe, 33]), so that would be moving further out of
tree than it already is. There is a good parser combinator library, at
least (nom), that could replicate the Parsec code in a way that's fairly
comprehensible by Haskell developers. Some of the more advanced
display/formatting features of CSL also need support from Pandoc output
templates to work correctly. Are we okay with all of that?
If anyone has any input on these interop problems, I'd love to hear it. At
the moment, it looks like the way forward is to replace the pandoc-citeproc
binary wholesale, speaking JSON and taking on all the pandoc-specific
features in Rust.
Cormac
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78b7f42d-7640-45ff-a359-f59355217af8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 5820 bytes --]
next reply other threads:[~2018-12-11 17:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-11 17:04 Cormac Relf [this message]
[not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-11 18:44 ` John MacFarlane
[not found] ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12 9:21 ` Cormac Relf
[not found] ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-12 20:38 ` John MacFarlane
[not found] ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12 21:07 ` Paulo Ney de Souza
[not found] ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-12-13 4:02 ` Cormac Relf
[not found] ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-17 4:37 ` Cormac Relf
[not found] ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 12:47 ` Cormac Relf
[not found] ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 17:47 ` John MacFarlane
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=78b7f42d-7640-45ff-a359-f59355217af8@googlegroups.com \
--to=web-v7sng7lnsvbsqp/k+iv0sw@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).