From: John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
To: Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org>,
pandoc-discuss
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Experimental citeproc implementation in Rust
Date: Tue, 11 Dec 2018 10:44:19 -0800 [thread overview]
Message-ID: <yh480kh8fjj43g.fsf@johnmacfarlane.net> (raw)
In-Reply-To: <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
That's an interesting idea. pandoc-citeproc is still
pretty crufty, and it doesn't always behave like
citeproc-js, so I can see the point of this.
The difficulties are that
- pandoc-citeproc is currently quite tightly
integrated with pandoc; it operates on the pandoc
AST. So as you note, that capability would have to
be reproduced somehow in citeproc-rs. I think that
the tree-walking work could be given to a lua filter
that either called out to citeproc-rs or linked to
a version of it. (I don't think luajit is required
for this; one can write lua modules in C, so it
should be possible to do it in rust.) But citeproc-rs would
still have to be able to handle pandoc JSON. Perhaps
that could just be the underlying format it operates
on (it would have to replace the current HTML-ish
syntax used in citeproc-js, and maybe it would have
to be made more expressive).
- One potential problem is that citeproc-rs would need to
change, sometimes, when pandoc does. Currently
that's not a problem since I maintain pandoc-citeproc.
- pandoc-citeproc does some things citeproc-js does
not do (these are, strictly speaking, extensions to
standard citeproc). For example, author-in-text
citations, citation prefixes and suffixes, proper
handling of math (that's actually just folded into
general pandoc support), movement of punctuation,
conversion from bibtex/biblatex and other formats.
Note that conversion from bibtex relies on pandoc's
latex parser; to reproduce this functionality, you'd
have to write a latex parser in rust or somehow call
out to pandoc.
Best,
John
Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes:
> Hi,
>
> I've been working on https://github.com/cormacrelf/citeproc-rs, an
> experimental new CSL and CSL-M citation processor written in Rust. The one
> tracking issue gives a rough overview of how early this is in development.
> t can't do name blocks yet, let alone disambiguation or structured
> bibliographies, but there are promising foundations. The coolest feature so
> far is the error reporting at parse time. Try running it on a style with
> errors like <number variable="issued" />. IContributions or support would
> be welcome.
>
> I'm raising it here because there's an interesting possibility that could
> come out of it, that touches the Pandoc. platform.
>
> - It could *replace citeproc-js* by compiling to WebAssembly that would
> run in Zotero, browsers and Node.
> - This is one good reason to use Rust, which has excellent WASM
> tooling. I have nothing against Haskell or working on pandoc-citeproc
> directly, but Haskell WASM support is just not there yet.
> - It could *feasibly also replace pandoc-citeproc*, and in fact can
> already build some pandoc JSON output.
> - It could feasibly *also* replace almost *every other citeproc* by
> exposing a native static library on every target the Rust/LLVM ecosystem
> supports. That could be wrapped in e.g. PHP, Ruby, Python, and Java, which
> all have FFI support. It's weird to me that nobody has built a
> lingua-franca native library yet, given how complex the specification is.
> It's a similar situation to libxml2 or libgit2: big, complex, but
> solve-once-use-everywhere.
>
> That's one ring to rule them all, all in a single codebase, fewer competing
> implementations, more uniform output across CSL tools and less work for the
> community on both bugfixing and CSL evolution. There are also long-standing
> bugs in pandoc-citeproc and citeproc-js that I'm aiming to fix in the
> process, alongside some reworking of the less-complete or less-thought-out
> extended features like citeproc-js' abbreviations or the fairly hacky and
> rigid author suppression in both pandoc and citeproc-js.
>
> The second point on that evil plan, replacing pandoc-citeproc, is a bit
> tricky, and might need a bit of thinking through, given that:
>
> - Using FFI from a Haskell pandoc-citeproc that handles the Pandoc parts
> is a bit... I don't know.
> - Imagine: pandoc-citeproc deserializes a big JSON document, walks
> it, parses [@doe, 31] syntax, collects a bunch of cites (with cite IDs
> attached) and then FFIs out the rest of the job, attaching pandoc JSON to
> the relevant points at the other end. There would be quite a lot of weird
> conversions and serialization in this, because Text.Pandoc.Definition
> doesn't and shouldn't provide a C ABI-compatible memory layout, but it
> might work.
> - You could replace the entire pandoc-citeproc JSON filter with a new
> binary, but the Lua API exists for a reason. Maybe if there's a bunch of
> work going on, avoiding double-JSON should be one of the goals. Is that
> something that should be written with a Lua FFI wrapper around citeproc-rs
> (i.e. the libciteproc static library it builds)? Setting aside the tricky
> problems with how to return owned datastructures over FFI without leaking
> memory, FFI is only available with LuaJIT, which as I understand it would
> have to become a system dependency for Pandoc through an hslua constraint
> that has not been specified in official Pandoc builds so far. In the
> alternative, it wouldn't be too hard to maintain a JSON filter for
> non-LuaJIT installs, but it sure would be confusing for users to have two
> ways for different platforms or configurations. Maybe JSON is good enough,
> and maybe serde_json is so fast it won't matter in the end. It would
> certainly be much simpler.
> - pandoc-citeproc includes syntax parsing that kinda defines part of
> Pandoc Markdown (i.e. [@doe, 33]), so that would be moving further out of
> tree than it already is. There is a good parser combinator library, at
> least (nom), that could replicate the Parsec code in a way that's fairly
> comprehensible by Haskell developers. Some of the more advanced
> display/formatting features of CSL also need support from Pandoc output
> templates to work correctly. Are we okay with all of that?
>
> If anyone has any input on these interop problems, I'd love to hear it. At
> the moment, it looks like the way forward is to replace the pandoc-citeproc
> binary wholesale, speaking JSON and taking on all the pandoc-specific
> features in Rust.
>
> Cormac
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78b7f42d-7640-45ff-a359-f59355217af8%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
next prev parent reply other threads:[~2018-12-11 18:44 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-11 17:04 Cormac Relf
[not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-11 18:44 ` John MacFarlane [this message]
[not found] ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12 9:21 ` Cormac Relf
[not found] ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-12 20:38 ` John MacFarlane
[not found] ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12 21:07 ` Paulo Ney de Souza
[not found] ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-12-13 4:02 ` Cormac Relf
[not found] ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-17 4:37 ` Cormac Relf
[not found] ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 12:47 ` Cormac Relf
[not found] ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 17:47 ` John MacFarlane
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yh480kh8fjj43g.fsf@johnmacfarlane.net \
--to=jgm-tvlzxgkolnx2fbvcvol8/a@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
--cc=web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).