* Experimental citeproc implementation in Rust @ 2018-12-11 17:04 Cormac Relf [not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Cormac Relf @ 2018-12-11 17:04 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 5177 bytes --] Hi, I've been working on https://github.com/cormacrelf/citeproc-rs, an experimental new CSL and CSL-M citation processor written in Rust. The one tracking issue gives a rough overview of how early this is in development. t can't do name blocks yet, let alone disambiguation or structured bibliographies, but there are promising foundations. The coolest feature so far is the error reporting at parse time. Try running it on a style with errors like <number variable="issued" />. IContributions or support would be welcome. I'm raising it here because there's an interesting possibility that could come out of it, that touches the Pandoc. platform. - It could *replace citeproc-js* by compiling to WebAssembly that would run in Zotero, browsers and Node. - This is one good reason to use Rust, which has excellent WASM tooling. I have nothing against Haskell or working on pandoc-citeproc directly, but Haskell WASM support is just not there yet. - It could *feasibly also replace pandoc-citeproc*, and in fact can already build some pandoc JSON output. - It could feasibly *also* replace almost *every other citeproc* by exposing a native static library on every target the Rust/LLVM ecosystem supports. That could be wrapped in e.g. PHP, Ruby, Python, and Java, which all have FFI support. It's weird to me that nobody has built a lingua-franca native library yet, given how complex the specification is. It's a similar situation to libxml2 or libgit2: big, complex, but solve-once-use-everywhere. That's one ring to rule them all, all in a single codebase, fewer competing implementations, more uniform output across CSL tools and less work for the community on both bugfixing and CSL evolution. There are also long-standing bugs in pandoc-citeproc and citeproc-js that I'm aiming to fix in the process, alongside some reworking of the less-complete or less-thought-out extended features like citeproc-js' abbreviations or the fairly hacky and rigid author suppression in both pandoc and citeproc-js. The second point on that evil plan, replacing pandoc-citeproc, is a bit tricky, and might need a bit of thinking through, given that: - Using FFI from a Haskell pandoc-citeproc that handles the Pandoc parts is a bit... I don't know. - Imagine: pandoc-citeproc deserializes a big JSON document, walks it, parses [@doe, 31] syntax, collects a bunch of cites (with cite IDs attached) and then FFIs out the rest of the job, attaching pandoc JSON to the relevant points at the other end. There would be quite a lot of weird conversions and serialization in this, because Text.Pandoc.Definition doesn't and shouldn't provide a C ABI-compatible memory layout, but it might work. - You could replace the entire pandoc-citeproc JSON filter with a new binary, but the Lua API exists for a reason. Maybe if there's a bunch of work going on, avoiding double-JSON should be one of the goals. Is that something that should be written with a Lua FFI wrapper around citeproc-rs (i.e. the libciteproc static library it builds)? Setting aside the tricky problems with how to return owned datastructures over FFI without leaking memory, FFI is only available with LuaJIT, which as I understand it would have to become a system dependency for Pandoc through an hslua constraint that has not been specified in official Pandoc builds so far. In the alternative, it wouldn't be too hard to maintain a JSON filter for non-LuaJIT installs, but it sure would be confusing for users to have two ways for different platforms or configurations. Maybe JSON is good enough, and maybe serde_json is so fast it won't matter in the end. It would certainly be much simpler. - pandoc-citeproc includes syntax parsing that kinda defines part of Pandoc Markdown (i.e. [@doe, 33]), so that would be moving further out of tree than it already is. There is a good parser combinator library, at least (nom), that could replicate the Parsec code in a way that's fairly comprehensible by Haskell developers. Some of the more advanced display/formatting features of CSL also need support from Pandoc output templates to work correctly. Are we okay with all of that? If anyone has any input on these interop problems, I'd love to hear it. At the moment, it looks like the way forward is to replace the pandoc-citeproc binary wholesale, speaking JSON and taking on all the pandoc-specific features in Rust. Cormac -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78b7f42d-7640-45ff-a359-f59355217af8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 5820 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Experimental citeproc implementation in Rust [not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2018-12-11 18:44 ` John MacFarlane [not found] ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: John MacFarlane @ 2018-12-11 18:44 UTC (permalink / raw) To: Cormac Relf, pandoc-discuss That's an interesting idea. pandoc-citeproc is still pretty crufty, and it doesn't always behave like citeproc-js, so I can see the point of this. The difficulties are that - pandoc-citeproc is currently quite tightly integrated with pandoc; it operates on the pandoc AST. So as you note, that capability would have to be reproduced somehow in citeproc-rs. I think that the tree-walking work could be given to a lua filter that either called out to citeproc-rs or linked to a version of it. (I don't think luajit is required for this; one can write lua modules in C, so it should be possible to do it in rust.) But citeproc-rs would still have to be able to handle pandoc JSON. Perhaps that could just be the underlying format it operates on (it would have to replace the current HTML-ish syntax used in citeproc-js, and maybe it would have to be made more expressive). - One potential problem is that citeproc-rs would need to change, sometimes, when pandoc does. Currently that's not a problem since I maintain pandoc-citeproc. - pandoc-citeproc does some things citeproc-js does not do (these are, strictly speaking, extensions to standard citeproc). For example, author-in-text citations, citation prefixes and suffixes, proper handling of math (that's actually just folded into general pandoc support), movement of punctuation, conversion from bibtex/biblatex and other formats. Note that conversion from bibtex relies on pandoc's latex parser; to reproduce this functionality, you'd have to write a latex parser in rust or somehow call out to pandoc. Best, John Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes: > Hi, > > I've been working on https://github.com/cormacrelf/citeproc-rs, an > experimental new CSL and CSL-M citation processor written in Rust. The one > tracking issue gives a rough overview of how early this is in development. > t can't do name blocks yet, let alone disambiguation or structured > bibliographies, but there are promising foundations. The coolest feature so > far is the error reporting at parse time. Try running it on a style with > errors like <number variable="issued" />. IContributions or support would > be welcome. > > I'm raising it here because there's an interesting possibility that could > come out of it, that touches the Pandoc. platform. > > - It could *replace citeproc-js* by compiling to WebAssembly that would > run in Zotero, browsers and Node. > - This is one good reason to use Rust, which has excellent WASM > tooling. I have nothing against Haskell or working on pandoc-citeproc > directly, but Haskell WASM support is just not there yet. > - It could *feasibly also replace pandoc-citeproc*, and in fact can > already build some pandoc JSON output. > - It could feasibly *also* replace almost *every other citeproc* by > exposing a native static library on every target the Rust/LLVM ecosystem > supports. That could be wrapped in e.g. PHP, Ruby, Python, and Java, which > all have FFI support. It's weird to me that nobody has built a > lingua-franca native library yet, given how complex the specification is. > It's a similar situation to libxml2 or libgit2: big, complex, but > solve-once-use-everywhere. > > That's one ring to rule them all, all in a single codebase, fewer competing > implementations, more uniform output across CSL tools and less work for the > community on both bugfixing and CSL evolution. There are also long-standing > bugs in pandoc-citeproc and citeproc-js that I'm aiming to fix in the > process, alongside some reworking of the less-complete or less-thought-out > extended features like citeproc-js' abbreviations or the fairly hacky and > rigid author suppression in both pandoc and citeproc-js. > > The second point on that evil plan, replacing pandoc-citeproc, is a bit > tricky, and might need a bit of thinking through, given that: > > - Using FFI from a Haskell pandoc-citeproc that handles the Pandoc parts > is a bit... I don't know. > - Imagine: pandoc-citeproc deserializes a big JSON document, walks > it, parses [@doe, 31] syntax, collects a bunch of cites (with cite IDs > attached) and then FFIs out the rest of the job, attaching pandoc JSON to > the relevant points at the other end. There would be quite a lot of weird > conversions and serialization in this, because Text.Pandoc.Definition > doesn't and shouldn't provide a C ABI-compatible memory layout, but it > might work. > - You could replace the entire pandoc-citeproc JSON filter with a new > binary, but the Lua API exists for a reason. Maybe if there's a bunch of > work going on, avoiding double-JSON should be one of the goals. Is that > something that should be written with a Lua FFI wrapper around citeproc-rs > (i.e. the libciteproc static library it builds)? Setting aside the tricky > problems with how to return owned datastructures over FFI without leaking > memory, FFI is only available with LuaJIT, which as I understand it would > have to become a system dependency for Pandoc through an hslua constraint > that has not been specified in official Pandoc builds so far. In the > alternative, it wouldn't be too hard to maintain a JSON filter for > non-LuaJIT installs, but it sure would be confusing for users to have two > ways for different platforms or configurations. Maybe JSON is good enough, > and maybe serde_json is so fast it won't matter in the end. It would > certainly be much simpler. > - pandoc-citeproc includes syntax parsing that kinda defines part of > Pandoc Markdown (i.e. [@doe, 33]), so that would be moving further out of > tree than it already is. There is a good parser combinator library, at > least (nom), that could replicate the Parsec code in a way that's fairly > comprehensible by Haskell developers. Some of the more advanced > display/formatting features of CSL also need support from Pandoc output > templates to work correctly. Are we okay with all of that? > > If anyone has any input on these interop problems, I'd love to hear it. At > the moment, it looks like the way forward is to replace the pandoc-citeproc > binary wholesale, speaking JSON and taking on all the pandoc-specific > features in Rust. > > Cormac > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78b7f42d-7640-45ff-a359-f59355217af8%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>]
* Re: Experimental citeproc implementation in Rust [not found] ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> @ 2018-12-12 9:21 ` Cormac Relf [not found] ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Cormac Relf @ 2018-12-12 9:21 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 4874 bytes --] That’s a good point about native lua modules. I’m looking into a safe API for that over at rlua <https://github.com/kyren/rlua/issues/105>, but it’s clearly possible in unsafe Rust. The output formatting architecture so far doesn’t actually use any particular internal format, it’s just a trait (like a Haskell typeclass) with an associated type. So PlainText builds Strings and ignores formatting (and is very fast), but Pandoc builds Vec<Inline>, where Inline comes from the pandoc_types crate. So an unsupported CSL formatting instruction like display="block" would be simply ignored in the Pandoc implementation. A fully-featured format would encode everything in an Html type that knows how to serialize itself at the end. I might change this, given its code size implications for the WebAssembly output, as Rust's monomorphisation means all formatting-dependent functions would be compiled and emitted three times with inlining performed on each. In the current architecture, you also have *inputs* that are generic over the output format. So a Cite is actually specialised for each input format, such that the locators and affixes are specialised and any deserialization would be to a Cite<Pandoc>, which will read Pandoc::Build = Vec<Inline> into its affixes. This is through the serde_json::Deserialize trait, which is pretty dead easy, it just boils down to keeping it in sync. That could be mitigated by doing an incomplete deserialization, and leaving unrecognised nodes in serialized form, such that new AST nodes wouldn't cause parse errors. But that's probably more work than maintenance in the first place. The BibTex parsing is a tricky one, though. There’s this <https://github.com/charlesvdv/nom-bibtex> for the main syntax, at least. I wouldn’t want to fork out to Pandoc for every single latex text field, but maybe the Lua API’s read would help here. It might be simpler to support both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a simple Rust-based parser, but not at the same time. What do people use backslash commands for in BibTeX? Are there names and document titles out there that really need the whole power of LaTeX to render? I might have to think about this some more. Perhaps a successor to CSL-JSON that accepts arbitrary JSON objects wherever the old one accepts strings. On Wednesday, December 12, 2018 at 5:44:35 AM UTC+11, John MacFarlane wrote: > > > That's an interesting idea. pandoc-citeproc is still > pretty crufty, and it doesn't always behave like > citeproc-js, so I can see the point of this. > > The difficulties are that > > - pandoc-citeproc is currently quite tightly > integrated with pandoc; it operates on the pandoc > AST. So as you note, that capability would have to > be reproduced somehow in citeproc-rs. I think that > the tree-walking work could be given to a lua filter > that either called out to citeproc-rs or linked to > a version of it. (I don't think luajit is required > for this; one can write lua modules in C, so it > should be possible to do it in rust.) But citeproc-rs would > still have to be able to handle pandoc JSON. Perhaps > that could just be the underlying format it operates > on (it would have to replace the current HTML-ish > syntax used in citeproc-js, and maybe it would have > to be made more expressive). > > - One potential problem is that citeproc-rs would need to > change, sometimes, when pandoc does. Currently > that's not a problem since I maintain pandoc-citeproc. > > - pandoc-citeproc does some things citeproc-js does > not do (these are, strictly speaking, extensions to > standard citeproc). For example, author-in-text > citations, citation prefixes and suffixes, proper > handling of math (that's actually just folded into > general pandoc support), movement of punctuation, > conversion from bibtex/biblatex and other formats. > Note that conversion from bibtex relies on pandoc's > latex parser; to reproduce this functionality, you'd > have to write a latex parser in rust or somehow call > out to pandoc. > > Best, > John > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9e7db31a-8244-4ac8-800b-25709cedc240%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 5786 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Experimental citeproc implementation in Rust [not found] ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2018-12-12 20:38 ` John MacFarlane [not found] ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: John MacFarlane @ 2018-12-12 20:38 UTC (permalink / raw) To: Cormac Relf, pandoc-discuss Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes: > The BibTex parsing is a tricky one, though. There’s this > <https://github.com/charlesvdv/nom-bibtex> for the main syntax, at least. I > wouldn’t want to fork out to Pandoc for every single latex text field, but > maybe the Lua API’s read would help here. That's an interesting point. If your parser just parsed the fields as RawInline (Format "latex") ---, you could have the lua filter do a separate pass at the beginning to try to convert all of these into native pandoc inlines using read. > It might be simpler to support > both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a > simple Rust-based parser, but not at the same time. What do people use > backslash commands for in BibTeX? Are there names and document titles out > there that really need the whole power of LaTeX to render? I might have to > think about this some more. Perhaps a successor to CSL-JSON that accepts > arbitrary JSON objects wherever the old one accepts strings. In practice, a fairly small subset of LaTeX would be enough to handle most of what you find in bibtex bibliographies. Certainly you will find things like `\emph`, inline math, and lots of escape characters like `\"{a}`. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/yh480kk1keeazt.fsf%40johnmacfarlane.net. For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>]
* Re: Experimental citeproc implementation in Rust [not found] ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> @ 2018-12-12 21:07 ` Paulo Ney de Souza [not found] ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Paulo Ney de Souza @ 2018-12-12 21:07 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: web-v7Sng7lNsVbsQp/K+IV0sw [-- Attachment #1: Type: text/plain, Size: 6432 bytes --] Here is a sample of elaborate titles that "really" happen in the wild -- these are taken from the Proceedings of the ICM 2018: title = {{$\bold Z$-theory: chasing ${\mathfrak m}/f$ theory}}, title = {New examples of complete Calabi--Yau metrics on $\mathbb{C}^n$ for $n\ge 3$}, title = {Uniqueness of the group measure space decomposition for {P}opa's {$\mathscr{HT}$} factors}, title = {Actions of {$\mathbb F_\infty$} whose {${\rm II}_1$} factors and orbit equivalence relations have prescribed fundamental group}, title = {Cocycle and orbit superrigidity for lattices in {${\rm SL}(n,\mathbb R)$} acting on homogeneous spaces}, title = {Profinite rigidity of $\mathbf{PGL}(2,{\Z}[\omega])$ and $\mathbf{PSL}(2,{\Z}[\omega])$}, title = {Representation of measures with polynomial denseness in {$L_p(\mathbb R,d\mu)$}, {$0<p<\infty$}, and its application to determinate moment problems}, title = {Zimmer's conjecture for actions of $\mathrm{SL}(m,\mathbb{Z})$}, title = {Can lattices in {${\rm SL}(n,\mathbb R)$} act on the circle?}, title = {Higher {T}eichm\"uller spaces: from {${\rm SL}(2,\mathbb R)$} to other {L}ie groups}, title = {Exponential decay of connection probabilities for subcritical Voronoi percolation in $\mathbb{R}^d$}, title = {A {KAM} scheme for {${\rm SL}(2,\mathbb R)$} cocycles with {L}iouvillean frequencies}, title = {On dynamics of {$Out(F_n)$} on {$\mathrm{PSL}_2({\mathbb C})$} characters}, title = {General topology meets model theory, on {$\mathfrak p$} and {$\mathfrak t$}},^M title = {New classes of {${\mathcal L}\sp{p}$}-spaces}, title = {A class of special {${\mathcal L}\sb{\infty }$}\ spaces}, title = {More {$\ell_r$} saturated {$\mathscr L^\infty$} spaces}, title = {The {${\mathcal L}\sb{p}$} spaces}, title = {A remark on bases in {${\mathcal L}\sb{p}$}-spaces with an application to complementably universal {${\mathcal L}\sb{\infty }$}-spaces}, title = {{${\rm SL}(2,\mathbb C)$} {C}hern-{S}imons theory and the asymptotic behavior of the colored {J}ones polynomial}, title = {K-polystability of {${\mathbb Q}$}-{F}ano varieties admitting {K}\"ahler-{E}instein metrics}, title = {Weak geodesic rays in the space of {K}\"ahler potentials and the class {$\mathcal{E}(X,\omega)$}}, title = {Operator-algebraic superridigity for {${\rm SL}_n(\mathbb Z)$}, {$n\geq 3$}}, title = {The space of closed subgroups of {$\mathbb R^n$} is stratified and simply connected}, title = {The irreducible representations of the {L}ie algebra {${\mathfrak s}{\mathfrak l}(2)$}\ and of the {W}eyl algebra}, title = {Singular {G}elfand-{T}setlin modules of {${\mathfrak{gl}}(n)$}}, title = {Families of irreducible singular Gelfand-Tsetlin modules of $\mathfrak{gl}(n)$}, title = {Infinite-dimensional representations of the {L}ie algebra {$\mathfrak{gl}(n,{\mathbb C})$} related to complex analogs of the {G}elfand-{T}setlin patterns and general hypergeometric functions on the {L}ie group {${\rm GL}(n,{\mathbb C})$}}, title = {A geometric approach to 1-singular {G}elfand--{T}setlin {$\mathfrak{gl}_n$}-modules}, title = {Geometric approach to $p$-singular Gelfand--Tsetlin $\mathfrak {gl}_n$-modules}, title = {On some {B}ruhat decomposition and the structure of the {H}ecke rings of {${\mathfrak p}$}-adic {C}hevalley groups}, title = {Stable $s$-minimal cones in $\mathbb{R}^3$ are flat for $s\sim 1$}, title = {Delaunay type domains for an overdetermined elliptic problem in {$\mathbb S^n\times\mathbb R$} and {$\Bbb H^n\times\Bbb R$}}, title = {A {KAM} scheme for {${\rm SL}(2,\mathbb R)$} cocycles with {L}iouvillean frequencies}, Paulo Ney On Wed, Dec 12, 2018 at 12:39 PM John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote: > Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes: > > > The BibTex parsing is a tricky one, though. There’s this > > <https://github.com/charlesvdv/nom-bibtex> for the main syntax, at > least. I > > wouldn’t want to fork out to Pandoc for every single latex text field, > but > > maybe the Lua API’s read would help here. > > That's an interesting point. If your parser just > parsed the fields as RawInline (Format "latex") ---, > you could have the lua filter do a separate pass at > the beginning to try to convert all of these into > native pandoc inlines using read. > > > It might be simpler to support > > both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a > > simple Rust-based parser, but not at the same time. What do people use > > backslash commands for in BibTeX? Are there names and document titles > out > > there that really need the whole power of LaTeX to render? I might have > to > > think about this some more. Perhaps a successor to CSL-JSON that accepts > > arbitrary JSON objects wherever the old one accepts strings. > > In practice, a fairly small subset of LaTeX would be > enough to handle most of what you find in bibtex > bibliographies. > > Certainly you will find things like `\emph`, inline > math, and lots of escape characters like `\"{a}`. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/yh480kk1keeazt.fsf%40johnmacfarlane.net > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 8233 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Experimental citeproc implementation in Rust [not found] ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2018-12-13 4:02 ` Cormac Relf [not found] ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Cormac Relf @ 2018-12-13 4:02 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 8887 bytes --] Thank you for this, it's instructive. I can't say I'm enthusiastic about starting from scratch on a parser, even a limited one. After thinking some more, calling Lua's read function also won't really work. The Rust code still has to run outside a Lua context: in JSON mode, as a BibTeX <-> CSL-JSON <-> YAML metadata converter, and to fulfil the role pandoc-citeproc takes in things like vim-pandoc's citekey completion. These are all fairly non-negotiable features. (Also, because it is Rust code that would be parsing reference data, you would need a way back into the Lua VM, which is more complicated than just being Lua->Rust only.) It might actually be possible to emulate the current pandoc-citeproc approach (Text.CSL.Compat) of linking against Pandoc itself. There is some promising work (broken but fixable) over at https://github.com/mgattozzi/curryrs with linking the GHC runtime and simplifying interop using the type system(s), so it might be possible to just define some FFI-able Inline types in a little haskell wrapper library much like today's Text.CSL.Compat, and link it in. 2x link-time optimisations hopefully wouldn't even pull in that much of Pandoc. In that way, users committed to pandoc output might even get to choose their input syntax through a metadata option; you could use the Pandoc HTML parser instead of micro-HTML, and even let people use Markdown or LaTeX in their Zotero entries and any emitted CSL-JSON. I think I would leave the citeproc-js micro-HTML syntax around for other output formats, because that means all of the pandoc baggage can still be stripped out for WebAssembly distribution. I also think for ease of development I won't dive too hard into the Lua approach, because correctness doesn't depend on it, it can be added later, and all of the JSON interop is already functional. It also turns out there's another serde_json-supporting pandoc-types Rust crate that already has tree-walking here <https://github.com/oli-obk/pandoc-ast>, so I think that problem might be mostly solved. On Thursday, December 13, 2018 at 8:07:17 AM UTC+11, Paulo Ney de Souza wrote: > > Here is a sample of elaborate titles that "really" happen in the wild -- > these are taken from the Proceedings of the ICM 2018: > > title = {{$\bold Z$-theory: chasing ${\mathfrak m}/f$ theory}}, > title = {New examples of complete Calabi--Yau metrics on > $\mathbb{C}^n$ for $n\ge 3$}, > title = {Uniqueness of the group measure space decomposition for > {P}opa's {$\mathscr{HT}$} factors}, > title = {Actions of {$\mathbb F_\infty$} whose {${\rm II}_1$} factors > and orbit equivalence relations have prescribed fundamental group}, > title = {Cocycle and orbit superrigidity for lattices in {${\rm > SL}(n,\mathbb R)$} acting on homogeneous spaces}, > title = {Profinite rigidity of $\mathbf{PGL}(2,{\Z}[\omega])$ and > $\mathbf{PSL}(2,{\Z}[\omega])$}, > title = {Representation of measures with polynomial denseness in > {$L_p(\mathbb R,d\mu)$}, {$0<p<\infty$}, and its application to determinate > moment problems}, > title = {Zimmer's conjecture for actions of > $\mathrm{SL}(m,\mathbb{Z})$}, > title = {Can lattices in {${\rm SL}(n,\mathbb R)$} act on the circle?}, > title = {Higher {T}eichm\"uller spaces: from {${\rm SL}(2,\mathbb R)$} > to other {L}ie groups}, > title = {Exponential decay of connection probabilities for subcritical > Voronoi percolation in $\mathbb{R}^d$}, > title = {A {KAM} scheme for {${\rm SL}(2,\mathbb R)$} cocycles with > {L}iouvillean frequencies}, > title = {On dynamics of {$Out(F_n)$} on {$\mathrm{PSL}_2({\mathbb > C})$} characters}, > title = {General topology meets model theory, on {$\mathfrak p$} and > {$\mathfrak t$}},^M > title = {New classes of {${\mathcal L}\sp{p}$}-spaces}, > title = {A class of special {${\mathcal L}\sb{\infty }$}\ spaces}, > title = {More {$\ell_r$} saturated {$\mathscr L^\infty$} spaces}, > title = {The {${\mathcal L}\sb{p}$} spaces}, > title = {A remark on bases in {${\mathcal L}\sb{p}$}-spaces with an > application to complementably universal {${\mathcal L}\sb{\infty > }$}-spaces}, > title = {{${\rm SL}(2,\mathbb C)$} {C}hern-{S}imons theory and the > asymptotic behavior of the colored {J}ones polynomial}, > title = {K-polystability of {${\mathbb Q}$}-{F}ano varieties admitting > {K}\"ahler-{E}instein metrics}, > title = {Weak geodesic rays in the space of {K}\"ahler potentials and > the class {$\mathcal{E}(X,\omega)$}}, > title = {Operator-algebraic superridigity for {${\rm SL}_n(\mathbb > Z)$}, {$n\geq 3$}}, > title = {The space of closed subgroups of {$\mathbb R^n$} is > stratified and simply connected}, > title = {The irreducible representations of the {L}ie algebra > {${\mathfrak s}{\mathfrak l}(2)$}\ and of the {W}eyl algebra}, > title = {Singular {G}elfand-{T}setlin modules of > {${\mathfrak{gl}}(n)$}}, > title = {Families of irreducible singular Gelfand-Tsetlin modules of > $\mathfrak{gl}(n)$}, > title = {Infinite-dimensional representations of the {L}ie algebra > {$\mathfrak{gl}(n,{\mathbb C})$} related to complex analogs of the > {G}elfand-{T}setlin patterns and general hypergeometric functions on the > {L}ie group {${\rm GL}(n,{\mathbb C})$}}, > title = {A geometric approach to 1-singular {G}elfand--{T}setlin > {$\mathfrak{gl}_n$}-modules}, > title = {Geometric approach to $p$-singular Gelfand--Tsetlin > $\mathfrak {gl}_n$-modules}, > title = {On some {B}ruhat decomposition and the structure of the > {H}ecke rings of {${\mathfrak p}$}-adic {C}hevalley groups}, > title = {Stable $s$-minimal cones in $\mathbb{R}^3$ are flat for > $s\sim 1$}, > title = {Delaunay type domains for an overdetermined elliptic problem > in {$\mathbb S^n\times\mathbb R$} and {$\Bbb H^n\times\Bbb R$}}, > title = {A {KAM} scheme for {${\rm SL}(2,\mathbb R)$} cocycles with > {L}iouvillean frequencies}, > > > Paulo Ney > > > On Wed, Dec 12, 2018 at 12:39 PM John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org > <javascript:>> wrote: > >> Cormac Relf <w...-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org <javascript:>> writes: >> >> > The BibTex parsing is a tricky one, though. There’s this >> > <https://github.com/charlesvdv/nom-bibtex> for the main syntax, at >> least. I >> > wouldn’t want to fork out to Pandoc for every single latex text field, >> but >> > maybe the Lua API’s read would help here. >> >> That's an interesting point. If your parser just >> parsed the fields as RawInline (Format "latex") ---, >> you could have the lua filter do a separate pass at >> the beginning to try to convert all of these into >> native pandoc inlines using read. >> >> > It might be simpler to support >> > both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a >> > simple Rust-based parser, but not at the same time. What do people use >> > backslash commands for in BibTeX? Are there names and document titles >> out >> > there that really need the whole power of LaTeX to render? I might have >> to >> > think about this some more. Perhaps a successor to CSL-JSON that >> accepts >> > arbitrary JSON objects wherever the old one accepts strings. >> >> In practice, a fairly small subset of LaTeX would be >> enough to handle most of what you find in bibtex >> bibliographies. >> >> Certainly you will find things like `\emph`, inline >> math, and lots of escape characters like `\"{a}`. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. >> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org >> <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/yh480kk1keeazt.fsf%40johnmacfarlane.net >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/786c8104-1297-465e-9cd9-d3c720e6685e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 11995 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Experimental citeproc implementation in Rust [not found] ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2018-12-17 4:37 ` Cormac Relf [not found] ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Cormac Relf @ 2018-12-17 4:37 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1641 bytes --] Update: Haskell is just so incredibly hard to link to, I don't think the FFI approach is good enough for now. If cabal ever implements the "type: native-static" option in the foreign-library stanza, then I think it could work, but until then, no. Without that, the options boil down to distributing thirty enormous dynamic libraries, or using a custom build of GHC with -fPIC, which is just too much work. I was optimistic but that was when I hadn't yet dived into the world of haskell tooling. It works fine for binaries, but practically nobody is linking to haskell libraries from non-haskell code, so nobody has really thought about how to make it easy. As an alternative, I think I might strip down the pandoc-citeproc program to its library-reading parts, emit a sub- and super-set of CSL-JSON with Pandoc AST inlines instead of strings, and parse the resulting stdout from Rust. It wouldn't be an addition to the CSL-JSON spec, it would just come with a big honking warning saying that the output is unstable, unspecified and for internal use only. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6cea66b7-a6e3-438f-8000-9c8ed32e91f3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 2102 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Experimental citeproc implementation in Rust [not found] ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2019-02-08 12:47 ` Cormac Relf [not found] ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Cormac Relf @ 2019-02-08 12:47 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 993 bytes --] My current thinking here is that you could do CSL-JSON pre-processing in Lua, using the tiny json.lua <https://github.com/rxi/json.lua>. That would be better in many ways than distributing a binary wrapper around Text.Pandoc which would still probably weigh 20MB like pandoc-citeproc does. I tried it out here. <https://github.com/cormacrelf/citeproc-rs/tree/d9065738589c62dbb8fcebca90bca902b1996a1b/pandoc-preproc> -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/41f8966a-f1da-4b7e-ac2e-b807f661af22%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 1428 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Experimental citeproc implementation in Rust [not found] ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2019-02-08 17:47 ` John MacFarlane 0 siblings, 0 replies; 9+ messages in thread From: John MacFarlane @ 2019-02-08 17:47 UTC (permalink / raw) To: Cormac Relf, pandoc-discuss If you just need json marshalling, then pandoc-types should be all you need -- you shouldn't need full pandoc. I don't know if that helps. Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes: > My current thinking here is that you could do CSL-JSON pre-processing in > Lua, using the tiny json.lua <https://github.com/rxi/json.lua>. That would > be better in many ways than distributing a binary wrapper around > Text.Pandoc which would still probably weigh 20MB like pandoc-citeproc > does. I tried it out here. > <https://github.com/cormacrelf/citeproc-rs/tree/d9065738589c62dbb8fcebca90bca902b1996a1b/pandoc-preproc> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/41f8966a-f1da-4b7e-ac2e-b807f661af22%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-02-08 17:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-11 17:04 Experimental citeproc implementation in Rust Cormac Relf [not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2018-12-11 18:44 ` John MacFarlane [not found] ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2018-12-12 9:21 ` Cormac Relf [not found] ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2018-12-12 20:38 ` John MacFarlane [not found] ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2018-12-12 21:07 ` Paulo Ney de Souza [not found] ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2018-12-13 4:02 ` Cormac Relf [not found] ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2018-12-17 4:37 ` Cormac Relf [not found] ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-02-08 12:47 ` Cormac Relf [not found] ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-02-08 17:47 ` John MacFarlane
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).