That’s a good point about native lua modules. I’m looking into a safe API for that over at rlua, but it’s clearly possible in unsafe Rust.


The output formatting architecture so far doesn’t actually use any particular internal format, it’s just a trait (like a Haskell typeclass) with an associated type. So PlainText builds Strings and ignores formatting (and is very fast), but Pandoc builds Vec<Inline>, where Inline comes from the pandoc_types crate. So an unsupported CSL formatting instruction like display="block" would be simply ignored in the Pandoc implementation. A fully-featured format would encode everything in an Html type that knows how to serialize itself at the end. I might change this, given its code size implications for the WebAssembly output, as Rust's monomorphisation means all formatting-dependent functions would be compiled and emitted three times with inlining performed on each.


In the current architecture, you also have inputs that are generic over the output format. So a Cite is actually specialised for each input format, such that the locators and affixes are specialised and any deserialization would be to a Cite<Pandoc>, which will read Pandoc::Build = Vec<Inline> into its affixes. This is through the serde_json::Deserialize trait, which is pretty dead easy, it just boils down to keeping it in sync. That could be mitigated by doing an incomplete deserialization, and leaving unrecognised nodes in serialized form, such that new AST nodes wouldn't cause parse errors. But that's probably more work than maintenance in the first place.


The BibTex parsing is a tricky one, though. There’s this for the main syntax, at least. I wouldn’t want to fork out to Pandoc for every single latex text field, but maybe the Lua API’s read would help here. It might be simpler to support both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a simple Rust-based parser, but not at the same time. What do people use backslash commands for in BibTeX? Are there names and document titles out there that really need the whole power of LaTeX to render? I might have to think about this some more. Perhaps a successor to CSL-JSON that accepts arbitrary JSON objects wherever the old one accepts strings.



On Wednesday, December 12, 2018 at 5:44:35 AM UTC+11, John MacFarlane wrote:

That's an interesting idea.  pandoc-citeproc is still
pretty crufty, and it doesn't always behave like
citeproc-js, so I can see the point of this.

The difficulties are that

- pandoc-citeproc is currently quite tightly
  integrated with pandoc; it operates on the pandoc
  AST.  So as you note, that capability would have to
  be reproduced somehow in citeproc-rs.  I think that
  the tree-walking work could be given to a lua filter
  that either called out to citeproc-rs or linked to
  a version of it.  (I don't think luajit is required
  for this; one can write lua modules in C, so it
  should be possible to do it in rust.) But citeproc-rs would
  still have to be able to handle pandoc JSON. Perhaps
  that could just be the underlying format it operates
  on (it would have to replace the current HTML-ish
  syntax used in citeproc-js, and maybe it would have
  to be made more expressive).

- One potential problem is that citeproc-rs would need to
  change, sometimes, when pandoc does.  Currently
  that's not a problem since I maintain pandoc-citeproc.

- pandoc-citeproc does some things citeproc-js does
  not do (these are, strictly speaking, extensions to
  standard citeproc).  For example, author-in-text
  citations, citation prefixes and suffixes, proper
  handling of math (that's actually just folded into
  general pandoc support), movement of punctuation,
  conversion from bibtex/biblatex and other formats.
  Note that conversion from bibtex relies on pandoc's
  latex parser; to reproduce this functionality, you'd
  have to write a latex parser in rust or somehow call
  out to pandoc.

Best,
John

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9e7db31a-8244-4ac8-800b-25709cedc240%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.