That’s a good point about native lua modules. I’m looking into a safe API for that over at rlua, but it’s clearly possible in unsafe Rust.
The output formatting architecture so far doesn’t actually use any
particular internal format, it’s just a trait (like a Haskell typeclass)
with an associated type. So PlainText
builds String
s and ignores formatting (and is very fast), but Pandoc
builds Vec<Inline>
, where Inline
comes from the pandoc_types
crate. So an unsupported CSL formatting instruction like display="block"
would be simply ignored in the Pandoc
implementation. A fully-featured format would encode everything in an Html
type that knows how to serialize itself at the end. I might change
this, given its code size implications for the WebAssembly output, as Rust's monomorphisation means all formatting-dependent functions would be compiled and
emitted three times with inlining performed on each.
In the current architecture, you also have inputs that are generic over the output format. So a Cite
is actually specialised for each input format, such that the locators
and affixes are specialised and any deserialization would be to a Cite<Pandoc>
, which will read Pandoc::Build
= Vec<Inline>
into its affixes. This is through the serde_json::Deserialize trait, which is pretty dead easy, it just boils down to keeping it in sync. That could be mitigated by doing an incomplete deserialization, and leaving unrecognised nodes in serialized form, such that new AST nodes wouldn't cause parse errors. But that's probably more work than maintenance in the first place.
The BibTex parsing is a tricky one, though. There’s this
for the main syntax, at least. I wouldn’t want to fork out to Pandoc
for every single latex text field, but maybe the Lua API’s read
would help here. It might be simpler to support both citeproc-js’
micro-HTML and a similarly limited micro-LaTeX with a simple Rust-based
parser, but not at the same time. What do people use backslash
commands for in BibTeX? Are there names and document titles out there
that really need the whole power of LaTeX to render? I might have to
think about this some more. Perhaps a successor to CSL-JSON that accepts arbitrary JSON objects wherever the old one accepts strings.
That's an interesting idea. pandoc-citeproc is still
pretty crufty, and it doesn't always behave like
citeproc-js, so I can see the point of this.
The difficulties are that
- pandoc-citeproc is currently quite tightly
integrated with pandoc; it operates on the pandoc
AST. So as you note, that capability would have to
be reproduced somehow in citeproc-rs. I think that
the tree-walking work could be given to a lua filter
that either called out to citeproc-rs or linked to
a version of it. (I don't think luajit is required
for this; one can write lua modules in C, so it
should be possible to do it in rust.) But citeproc-rs would
still have to be able to handle pandoc JSON. Perhaps
that could just be the underlying format it operates
on (it would have to replace the current HTML-ish
syntax used in citeproc-js, and maybe it would have
to be made more expressive).
- One potential problem is that citeproc-rs would need to
change, sometimes, when pandoc does. Currently
that's not a problem since I maintain pandoc-citeproc.
- pandoc-citeproc does some things citeproc-js does
not do (these are, strictly speaking, extensions to
standard citeproc). For example, author-in-text
citations, citation prefixes and suffixes, proper
handling of math (that's actually just folded into
general pandoc support), movement of punctuation,
conversion from bibtex/biblatex and other formats.
Note that conversion from bibtex relies on pandoc's
latex parser; to reproduce this functionality, you'd
have to write a latex parser in rust or somehow call
out to pandoc.
Best,
John