public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
To: Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org>,
	pandoc-discuss
	<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Experimental citeproc implementation in Rust
Date: Tue, 11 Dec 2018 10:44:19 -0800	[thread overview]
Message-ID: <yh480kh8fjj43g.fsf@johnmacfarlane.net> (raw)
In-Reply-To: <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


That's an interesting idea.  pandoc-citeproc is still
pretty crufty, and it doesn't always behave like
citeproc-js, so I can see the point of this.

The difficulties are that

- pandoc-citeproc is currently quite tightly
  integrated with pandoc; it operates on the pandoc
  AST.  So as you note, that capability would have to
  be reproduced somehow in citeproc-rs.  I think that
  the tree-walking work could be given to a lua filter
  that either called out to citeproc-rs or linked to
  a version of it.  (I don't think luajit is required
  for this; one can write lua modules in C, so it
  should be possible to do it in rust.) But citeproc-rs would
  still have to be able to handle pandoc JSON. Perhaps
  that could just be the underlying format it operates
  on (it would have to replace the current HTML-ish
  syntax used in citeproc-js, and maybe it would have
  to be made more expressive).

- One potential problem is that citeproc-rs would need to
  change, sometimes, when pandoc does.  Currently
  that's not a problem since I maintain pandoc-citeproc.

- pandoc-citeproc does some things citeproc-js does
  not do (these are, strictly speaking, extensions to
  standard citeproc).  For example, author-in-text
  citations, citation prefixes and suffixes, proper
  handling of math (that's actually just folded into
  general pandoc support), movement of punctuation,
  conversion from bibtex/biblatex and other formats.
  Note that conversion from bibtex relies on pandoc's
  latex parser; to reproduce this functionality, you'd
  have to write a latex parser in rust or somehow call
  out to pandoc.

Best,
John


Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes:

> Hi,
>
> I've been working on https://github.com/cormacrelf/citeproc-rs, an 
> experimental new CSL and CSL-M citation processor written in Rust. The one 
> tracking issue gives a rough overview of how early this is in development. 
> t can't do name blocks yet, let alone disambiguation or structured 
> bibliographies, but there are promising foundations. The coolest feature so 
> far is the error reporting at parse time. Try running it on a style with 
> errors like <number variable="issued" />. IContributions or support would 
> be welcome.
>
> I'm raising it here because there's an interesting possibility that could 
> come out of it, that touches the Pandoc. platform.
>
>    - It could *replace citeproc-js* by compiling to WebAssembly that would 
>    run in Zotero, browsers and Node.
>       - This is one good reason to use Rust, which has excellent WASM 
>       tooling. I have nothing against Haskell or working on pandoc-citeproc 
>       directly, but Haskell WASM support is just not there yet.
>       - It could *feasibly also replace pandoc-citeproc*, and in fact can 
>    already build some pandoc JSON output.
>    - It could feasibly *also* replace almost *every other citeproc* by 
>    exposing a native static library on every target the Rust/LLVM ecosystem 
>    supports. That could be wrapped in e.g. PHP, Ruby, Python, and Java, which 
>    all have FFI support. It's weird to me that nobody has built a 
>    lingua-franca native library yet, given how complex the specification is. 
>    It's a similar situation to libxml2 or libgit2: big, complex, but 
>    solve-once-use-everywhere.
>    
> That's one ring to rule them all, all in a single codebase, fewer competing 
> implementations, more uniform output across CSL tools and less work for the 
> community on both bugfixing and CSL evolution. There are also long-standing 
> bugs in pandoc-citeproc and citeproc-js that I'm aiming to fix in the 
> process, alongside some reworking of the less-complete or less-thought-out 
> extended features like citeproc-js' abbreviations or the fairly hacky and 
> rigid author suppression in both pandoc and citeproc-js.
>
> The second point on that evil plan, replacing pandoc-citeproc, is a bit 
> tricky, and might need a bit of thinking through, given that: 
>
>    - Using FFI from a Haskell pandoc-citeproc that handles the Pandoc parts 
>    is a bit... I don't know.
>       - Imagine: pandoc-citeproc deserializes a big JSON document, walks 
>       it, parses [@doe, 31] syntax, collects a bunch of cites (with cite IDs 
>       attached) and then FFIs out the rest of the job, attaching pandoc JSON to 
>       the relevant points at the other end. There would be quite a lot of weird 
>       conversions and serialization in this, because Text.Pandoc.Definition 
>       doesn't and shouldn't provide a C ABI-compatible memory layout, but it 
>       might work. 
>    - You could replace the entire pandoc-citeproc JSON filter with a new 
>    binary, but the Lua API exists for a reason. Maybe if there's a bunch of 
>    work going on, avoiding double-JSON should be one of the goals. Is that 
>    something that should be written with a Lua FFI wrapper around citeproc-rs 
>    (i.e. the libciteproc static library it builds)? Setting aside the tricky 
>    problems with how to return owned datastructures over FFI without leaking 
>    memory, FFI is only available with LuaJIT, which as I understand it would 
>    have to become a system dependency for Pandoc through an hslua constraint 
>    that has not been specified in official Pandoc builds so far. In the 
>    alternative, it wouldn't be too hard to maintain a JSON filter for 
>    non-LuaJIT installs, but it sure would be confusing for users to have two 
>    ways for different platforms or configurations. Maybe JSON is good enough, 
>    and maybe serde_json is so fast it won't matter in the end. It would 
>    certainly be much simpler.
>    - pandoc-citeproc includes syntax parsing that kinda defines part of 
>    Pandoc Markdown (i.e. [@doe, 33]), so that would be moving further out of 
>    tree than it already is. There is a good parser combinator library, at 
>    least (nom), that could replicate the Parsec code in a way that's fairly 
>    comprehensible by Haskell developers. Some of the more advanced 
>    display/formatting features of CSL also need support from Pandoc output 
>    templates to work correctly. Are we okay with all of that?
>    
> If anyone has any input on these interop problems, I'd love to hear it. At 
> the moment, it looks like the way forward is to replace the pandoc-citeproc 
> binary wholesale, speaking JSON and taking on all the pandoc-specific 
> features in Rust.
>
> Cormac
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78b7f42d-7640-45ff-a359-f59355217af8%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


  parent reply	other threads:[~2018-12-11 18:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-11 17:04 Cormac Relf
     [not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-11 18:44   ` John MacFarlane [this message]
     [not found]     ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12  9:21       ` Cormac Relf
     [not found]         ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-12 20:38           ` John MacFarlane
     [not found]             ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12 21:07               ` Paulo Ney de Souza
     [not found]                 ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-12-13  4:02                   ` Cormac Relf
     [not found]                     ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-17  4:37                       ` Cormac Relf
     [not found]                         ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 12:47                           ` Cormac Relf
     [not found]                             ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 17:47                               ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yh480kh8fjj43g.fsf@johnmacfarlane.net \
    --to=jgm-tvlzxgkolnx2fbvcvol8/a@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    --cc=web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).