public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Experimental citeproc implementation in Rust
@ 2018-12-11 17:04 Cormac Relf
       [not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Cormac Relf @ 2018-12-11 17:04 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5177 bytes --]

Hi,

I've been working on https://github.com/cormacrelf/citeproc-rs, an 
experimental new CSL and CSL-M citation processor written in Rust. The one 
tracking issue gives a rough overview of how early this is in development. 
t can't do name blocks yet, let alone disambiguation or structured 
bibliographies, but there are promising foundations. The coolest feature so 
far is the error reporting at parse time. Try running it on a style with 
errors like <number variable="issued" />. IContributions or support would 
be welcome.

I'm raising it here because there's an interesting possibility that could 
come out of it, that touches the Pandoc. platform.

   - It could *replace citeproc-js* by compiling to WebAssembly that would 
   run in Zotero, browsers and Node.
      - This is one good reason to use Rust, which has excellent WASM 
      tooling. I have nothing against Haskell or working on pandoc-citeproc 
      directly, but Haskell WASM support is just not there yet.
      - It could *feasibly also replace pandoc-citeproc*, and in fact can 
   already build some pandoc JSON output.
   - It could feasibly *also* replace almost *every other citeproc* by 
   exposing a native static library on every target the Rust/LLVM ecosystem 
   supports. That could be wrapped in e.g. PHP, Ruby, Python, and Java, which 
   all have FFI support. It's weird to me that nobody has built a 
   lingua-franca native library yet, given how complex the specification is. 
   It's a similar situation to libxml2 or libgit2: big, complex, but 
   solve-once-use-everywhere.
   
That's one ring to rule them all, all in a single codebase, fewer competing 
implementations, more uniform output across CSL tools and less work for the 
community on both bugfixing and CSL evolution. There are also long-standing 
bugs in pandoc-citeproc and citeproc-js that I'm aiming to fix in the 
process, alongside some reworking of the less-complete or less-thought-out 
extended features like citeproc-js' abbreviations or the fairly hacky and 
rigid author suppression in both pandoc and citeproc-js.

The second point on that evil plan, replacing pandoc-citeproc, is a bit 
tricky, and might need a bit of thinking through, given that: 

   - Using FFI from a Haskell pandoc-citeproc that handles the Pandoc parts 
   is a bit... I don't know.
      - Imagine: pandoc-citeproc deserializes a big JSON document, walks 
      it, parses [@doe, 31] syntax, collects a bunch of cites (with cite IDs 
      attached) and then FFIs out the rest of the job, attaching pandoc JSON to 
      the relevant points at the other end. There would be quite a lot of weird 
      conversions and serialization in this, because Text.Pandoc.Definition 
      doesn't and shouldn't provide a C ABI-compatible memory layout, but it 
      might work. 
   - You could replace the entire pandoc-citeproc JSON filter with a new 
   binary, but the Lua API exists for a reason. Maybe if there's a bunch of 
   work going on, avoiding double-JSON should be one of the goals. Is that 
   something that should be written with a Lua FFI wrapper around citeproc-rs 
   (i.e. the libciteproc static library it builds)? Setting aside the tricky 
   problems with how to return owned datastructures over FFI without leaking 
   memory, FFI is only available with LuaJIT, which as I understand it would 
   have to become a system dependency for Pandoc through an hslua constraint 
   that has not been specified in official Pandoc builds so far. In the 
   alternative, it wouldn't be too hard to maintain a JSON filter for 
   non-LuaJIT installs, but it sure would be confusing for users to have two 
   ways for different platforms or configurations. Maybe JSON is good enough, 
   and maybe serde_json is so fast it won't matter in the end. It would 
   certainly be much simpler.
   - pandoc-citeproc includes syntax parsing that kinda defines part of 
   Pandoc Markdown (i.e. [@doe, 33]), so that would be moving further out of 
   tree than it already is. There is a good parser combinator library, at 
   least (nom), that could replicate the Parsec code in a way that's fairly 
   comprehensible by Haskell developers. Some of the more advanced 
   display/formatting features of CSL also need support from Pandoc output 
   templates to work correctly. Are we okay with all of that?
   
If anyone has any input on these interop problems, I'd love to hear it. At 
the moment, it looks like the way forward is to replace the pandoc-citeproc 
binary wholesale, speaking JSON and taking on all the pandoc-specific 
features in Rust.

Cormac

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78b7f42d-7640-45ff-a359-f59355217af8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5820 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Experimental citeproc implementation in Rust
       [not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-11 18:44   ` John MacFarlane
       [not found]     ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: John MacFarlane @ 2018-12-11 18:44 UTC (permalink / raw)
  To: Cormac Relf, pandoc-discuss


That's an interesting idea.  pandoc-citeproc is still
pretty crufty, and it doesn't always behave like
citeproc-js, so I can see the point of this.

The difficulties are that

- pandoc-citeproc is currently quite tightly
  integrated with pandoc; it operates on the pandoc
  AST.  So as you note, that capability would have to
  be reproduced somehow in citeproc-rs.  I think that
  the tree-walking work could be given to a lua filter
  that either called out to citeproc-rs or linked to
  a version of it.  (I don't think luajit is required
  for this; one can write lua modules in C, so it
  should be possible to do it in rust.) But citeproc-rs would
  still have to be able to handle pandoc JSON. Perhaps
  that could just be the underlying format it operates
  on (it would have to replace the current HTML-ish
  syntax used in citeproc-js, and maybe it would have
  to be made more expressive).

- One potential problem is that citeproc-rs would need to
  change, sometimes, when pandoc does.  Currently
  that's not a problem since I maintain pandoc-citeproc.

- pandoc-citeproc does some things citeproc-js does
  not do (these are, strictly speaking, extensions to
  standard citeproc).  For example, author-in-text
  citations, citation prefixes and suffixes, proper
  handling of math (that's actually just folded into
  general pandoc support), movement of punctuation,
  conversion from bibtex/biblatex and other formats.
  Note that conversion from bibtex relies on pandoc's
  latex parser; to reproduce this functionality, you'd
  have to write a latex parser in rust or somehow call
  out to pandoc.

Best,
John


Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes:

> Hi,
>
> I've been working on https://github.com/cormacrelf/citeproc-rs, an 
> experimental new CSL and CSL-M citation processor written in Rust. The one 
> tracking issue gives a rough overview of how early this is in development. 
> t can't do name blocks yet, let alone disambiguation or structured 
> bibliographies, but there are promising foundations. The coolest feature so 
> far is the error reporting at parse time. Try running it on a style with 
> errors like <number variable="issued" />. IContributions or support would 
> be welcome.
>
> I'm raising it here because there's an interesting possibility that could 
> come out of it, that touches the Pandoc. platform.
>
>    - It could *replace citeproc-js* by compiling to WebAssembly that would 
>    run in Zotero, browsers and Node.
>       - This is one good reason to use Rust, which has excellent WASM 
>       tooling. I have nothing against Haskell or working on pandoc-citeproc 
>       directly, but Haskell WASM support is just not there yet.
>       - It could *feasibly also replace pandoc-citeproc*, and in fact can 
>    already build some pandoc JSON output.
>    - It could feasibly *also* replace almost *every other citeproc* by 
>    exposing a native static library on every target the Rust/LLVM ecosystem 
>    supports. That could be wrapped in e.g. PHP, Ruby, Python, and Java, which 
>    all have FFI support. It's weird to me that nobody has built a 
>    lingua-franca native library yet, given how complex the specification is. 
>    It's a similar situation to libxml2 or libgit2: big, complex, but 
>    solve-once-use-everywhere.
>    
> That's one ring to rule them all, all in a single codebase, fewer competing 
> implementations, more uniform output across CSL tools and less work for the 
> community on both bugfixing and CSL evolution. There are also long-standing 
> bugs in pandoc-citeproc and citeproc-js that I'm aiming to fix in the 
> process, alongside some reworking of the less-complete or less-thought-out 
> extended features like citeproc-js' abbreviations or the fairly hacky and 
> rigid author suppression in both pandoc and citeproc-js.
>
> The second point on that evil plan, replacing pandoc-citeproc, is a bit 
> tricky, and might need a bit of thinking through, given that: 
>
>    - Using FFI from a Haskell pandoc-citeproc that handles the Pandoc parts 
>    is a bit... I don't know.
>       - Imagine: pandoc-citeproc deserializes a big JSON document, walks 
>       it, parses [@doe, 31] syntax, collects a bunch of cites (with cite IDs 
>       attached) and then FFIs out the rest of the job, attaching pandoc JSON to 
>       the relevant points at the other end. There would be quite a lot of weird 
>       conversions and serialization in this, because Text.Pandoc.Definition 
>       doesn't and shouldn't provide a C ABI-compatible memory layout, but it 
>       might work. 
>    - You could replace the entire pandoc-citeproc JSON filter with a new 
>    binary, but the Lua API exists for a reason. Maybe if there's a bunch of 
>    work going on, avoiding double-JSON should be one of the goals. Is that 
>    something that should be written with a Lua FFI wrapper around citeproc-rs 
>    (i.e. the libciteproc static library it builds)? Setting aside the tricky 
>    problems with how to return owned datastructures over FFI without leaking 
>    memory, FFI is only available with LuaJIT, which as I understand it would 
>    have to become a system dependency for Pandoc through an hslua constraint 
>    that has not been specified in official Pandoc builds so far. In the 
>    alternative, it wouldn't be too hard to maintain a JSON filter for 
>    non-LuaJIT installs, but it sure would be confusing for users to have two 
>    ways for different platforms or configurations. Maybe JSON is good enough, 
>    and maybe serde_json is so fast it won't matter in the end. It would 
>    certainly be much simpler.
>    - pandoc-citeproc includes syntax parsing that kinda defines part of 
>    Pandoc Markdown (i.e. [@doe, 33]), so that would be moving further out of 
>    tree than it already is. There is a good parser combinator library, at 
>    least (nom), that could replicate the Parsec code in a way that's fairly 
>    comprehensible by Haskell developers. Some of the more advanced 
>    display/formatting features of CSL also need support from Pandoc output 
>    templates to work correctly. Are we okay with all of that?
>    
> If anyone has any input on these interop problems, I'd love to hear it. At 
> the moment, it looks like the way forward is to replace the pandoc-citeproc 
> binary wholesale, speaking JSON and taking on all the pandoc-specific 
> features in Rust.
>
> Cormac
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78b7f42d-7640-45ff-a359-f59355217af8%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Experimental citeproc implementation in Rust
       [not found]     ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-12-12  9:21       ` Cormac Relf
       [not found]         ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Cormac Relf @ 2018-12-12  9:21 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4874 bytes --]

 

That’s a good point about native lua modules. I’m looking into a safe API 
for that over at rlua <https://github.com/kyren/rlua/issues/105>, but it’s 
clearly possible in unsafe Rust.


The output formatting architecture so far doesn’t actually use any 
particular internal format, it’s just a trait (like a Haskell typeclass) 
with an associated type. So PlainText builds Strings and ignores formatting 
(and is very fast), but Pandoc builds Vec<Inline>, where Inline comes from 
the pandoc_types crate. So an unsupported CSL formatting instruction like 
display="block" would be simply ignored in the Pandoc implementation. A 
fully-featured format would encode everything in an Html type that knows 
how to serialize itself at the end. I might change this, given its code 
size implications for the WebAssembly output, as Rust's monomorphisation 
means all formatting-dependent functions would be compiled and emitted 
three times with inlining performed on each.


In the current architecture, you also have *inputs* that are generic over 
the output format. So a Cite is actually specialised for each input format, 
such that the locators and affixes are specialised and any deserialization 
would be to a Cite<Pandoc>, which will read Pandoc::Build = Vec<Inline> 
into its affixes. This is through the serde_json::Deserialize trait, which 
is pretty dead easy, it just boils down to keeping it in sync. That could 
be mitigated by doing an incomplete deserialization, and leaving 
unrecognised nodes in serialized form, such that new AST nodes wouldn't 
cause parse errors. But that's probably more work than maintenance in the 
first place.


The BibTex parsing is a tricky one, though. There’s this 
<https://github.com/charlesvdv/nom-bibtex> for the main syntax, at least. I 
wouldn’t want to fork out to Pandoc for every single latex text field, but 
maybe the Lua API’s read would help here. It might be simpler to support 
both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a 
simple Rust-based parser, but not at the same time. What do people use 
backslash commands for in BibTeX? Are there names and document titles out 
there that really need the whole power of LaTeX to render? I might have to 
think about this some more. Perhaps a successor to CSL-JSON that accepts 
arbitrary JSON objects wherever the old one accepts strings.


On Wednesday, December 12, 2018 at 5:44:35 AM UTC+11, John MacFarlane wrote:
>
>
> That's an interesting idea.  pandoc-citeproc is still 
> pretty crufty, and it doesn't always behave like 
> citeproc-js, so I can see the point of this. 
>
> The difficulties are that 
>
> - pandoc-citeproc is currently quite tightly 
>   integrated with pandoc; it operates on the pandoc 
>   AST.  So as you note, that capability would have to 
>   be reproduced somehow in citeproc-rs.  I think that 
>   the tree-walking work could be given to a lua filter 
>   that either called out to citeproc-rs or linked to 
>   a version of it.  (I don't think luajit is required 
>   for this; one can write lua modules in C, so it 
>   should be possible to do it in rust.) But citeproc-rs would 
>   still have to be able to handle pandoc JSON. Perhaps 
>   that could just be the underlying format it operates 
>   on (it would have to replace the current HTML-ish 
>   syntax used in citeproc-js, and maybe it would have 
>   to be made more expressive). 
>
> - One potential problem is that citeproc-rs would need to 
>   change, sometimes, when pandoc does.  Currently 
>   that's not a problem since I maintain pandoc-citeproc. 
>
> - pandoc-citeproc does some things citeproc-js does 
>   not do (these are, strictly speaking, extensions to 
>   standard citeproc).  For example, author-in-text 
>   citations, citation prefixes and suffixes, proper 
>   handling of math (that's actually just folded into 
>   general pandoc support), movement of punctuation, 
>   conversion from bibtex/biblatex and other formats. 
>   Note that conversion from bibtex relies on pandoc's 
>   latex parser; to reproduce this functionality, you'd 
>   have to write a latex parser in rust or somehow call 
>   out to pandoc. 
>
> Best, 
> John 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9e7db31a-8244-4ac8-800b-25709cedc240%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5786 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Experimental citeproc implementation in Rust
       [not found]         ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-12 20:38           ` John MacFarlane
       [not found]             ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: John MacFarlane @ 2018-12-12 20:38 UTC (permalink / raw)
  To: Cormac Relf, pandoc-discuss

Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes:

> The BibTex parsing is a tricky one, though. There’s this 
> <https://github.com/charlesvdv/nom-bibtex> for the main syntax, at least. I 
> wouldn’t want to fork out to Pandoc for every single latex text field, but 
> maybe the Lua API’s read would help here.

That's an interesting point.  If your parser just
parsed the fields as RawInline (Format "latex") ---,
you could have the lua filter do a separate pass at
the beginning to try to convert all of these into
native pandoc inlines using read.

> It might be simpler to support 
> both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a 
> simple Rust-based parser, but not at the same time. What do people use 
> backslash commands for in BibTeX? Are there names and document titles out 
> there that really need the whole power of LaTeX to render? I might have to 
> think about this some more. Perhaps a successor to CSL-JSON that accepts 
> arbitrary JSON objects wherever the old one accepts strings.

In practice, a fairly small subset of LaTeX would be
enough to handle most of what you find in bibtex
bibliographies.

Certainly you will find things like `\emph`, inline
math, and lots of escape characters like `\"{a}`.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/yh480kk1keeazt.fsf%40johnmacfarlane.net.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Experimental citeproc implementation in Rust
       [not found]             ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-12-12 21:07               ` Paulo Ney de Souza
       [not found]                 ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Paulo Ney de Souza @ 2018-12-12 21:07 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: web-v7Sng7lNsVbsQp/K+IV0sw

[-- Attachment #1: Type: text/plain, Size: 6432 bytes --]

Here is a sample of elaborate titles that "really" happen in the wild --
these are taken from the Proceedings of the ICM 2018:

    title = {{$\bold Z$-theory: chasing ${\mathfrak m}/f$ theory}},
    title = {New examples of complete Calabi--Yau metrics on $\mathbb{C}^n$
for $n\ge 3$},
    title = {Uniqueness of the group measure space decomposition for
{P}opa's {$\mathscr{HT}$} factors},
    title = {Actions of {$\mathbb F_\infty$} whose {${\rm II}_1$} factors
and orbit equivalence relations have prescribed fundamental group},
    title = {Cocycle and orbit superrigidity for lattices in {${\rm
SL}(n,\mathbb R)$} acting on homogeneous spaces},
    title = {Profinite rigidity of $\mathbf{PGL}(2,{\Z}[\omega])$ and
$\mathbf{PSL}(2,{\Z}[\omega])$},
    title = {Representation of measures with polynomial denseness in
{$L_p(\mathbb R,d\mu)$}, {$0<p<\infty$}, and its application to determinate
moment problems},
    title = {Zimmer's conjecture for actions of
$\mathrm{SL}(m,\mathbb{Z})$},
    title = {Can lattices in {${\rm SL}(n,\mathbb R)$} act on the circle?},
    title = {Higher {T}eichm\"uller spaces: from {${\rm SL}(2,\mathbb R)$}
to other {L}ie groups},
    title = {Exponential decay of connection probabilities for subcritical
Voronoi percolation in $\mathbb{R}^d$},
    title = {A {KAM} scheme for {${\rm SL}(2,\mathbb R)$} cocycles with
{L}iouvillean frequencies},
    title = {On dynamics of {$Out(F_n)$} on {$\mathrm{PSL}_2({\mathbb C})$}
characters},
    title = {General topology meets model theory, on {$\mathfrak p$} and
{$\mathfrak t$}},^M
    title = {New classes of {${\mathcal L}\sp{p}$}-spaces},
    title = {A class of special {${\mathcal L}\sb{\infty }$}\ spaces},
    title = {More {$\ell_r$} saturated {$\mathscr L^\infty$} spaces},
    title = {The {${\mathcal L}\sb{p}$} spaces},
    title = {A remark on bases in {${\mathcal L}\sb{p}$}-spaces with an
application to complementably universal {${\mathcal L}\sb{\infty
}$}-spaces},
    title = {{${\rm SL}(2,\mathbb C)$} {C}hern-{S}imons theory and the
asymptotic behavior of the colored {J}ones polynomial},
    title = {K-polystability of {${\mathbb Q}$}-{F}ano varieties admitting
{K}\"ahler-{E}instein metrics},
    title = {Weak geodesic rays in the space of {K}\"ahler potentials and
the class {$\mathcal{E}(X,\omega)$}},
    title = {Operator-algebraic superridigity for {${\rm SL}_n(\mathbb
Z)$}, {$n\geq 3$}},
    title = {The space of closed subgroups of {$\mathbb R^n$} is stratified
and simply connected},
    title = {The irreducible representations of the {L}ie algebra
{${\mathfrak s}{\mathfrak l}(2)$}\ and of the {W}eyl algebra},
    title = {Singular {G}elfand-{T}setlin modules of
{${\mathfrak{gl}}(n)$}},
    title = {Families of irreducible singular Gelfand-Tsetlin modules of
$\mathfrak{gl}(n)$},
    title = {Infinite-dimensional representations of the {L}ie algebra
{$\mathfrak{gl}(n,{\mathbb C})$} related to complex analogs of the
{G}elfand-{T}setlin patterns and general hypergeometric functions on the
{L}ie group {${\rm GL}(n,{\mathbb C})$}},
    title = {A geometric approach to 1-singular {G}elfand--{T}setlin
{$\mathfrak{gl}_n$}-modules},
    title = {Geometric approach to $p$-singular Gelfand--Tsetlin $\mathfrak
{gl}_n$-modules},
    title = {On some {B}ruhat decomposition and the structure of the
{H}ecke rings of {${\mathfrak p}$}-adic {C}hevalley groups},
    title = {Stable $s$-minimal cones in $\mathbb{R}^3$ are flat for $s\sim
1$},
    title = {Delaunay type domains for an overdetermined elliptic problem
in {$\mathbb S^n\times\mathbb R$} and {$\Bbb H^n\times\Bbb R$}},
    title = {A {KAM} scheme for {${\rm SL}(2,\mathbb R)$} cocycles with
{L}iouvillean frequencies},


Paulo Ney


On Wed, Dec 12, 2018 at 12:39 PM John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

> Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes:
>
> > The BibTex parsing is a tricky one, though. There’s this
> > <https://github.com/charlesvdv/nom-bibtex> for the main syntax, at
> least. I
> > wouldn’t want to fork out to Pandoc for every single latex text field,
> but
> > maybe the Lua API’s read would help here.
>
> That's an interesting point.  If your parser just
> parsed the fields as RawInline (Format "latex") ---,
> you could have the lua filter do a separate pass at
> the beginning to try to convert all of these into
> native pandoc inlines using read.
>
> > It might be simpler to support
> > both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a
> > simple Rust-based parser, but not at the same time. What do people use
> > backslash commands for in BibTeX? Are there names and document titles
> out
> > there that really need the whole power of LaTeX to render? I might have
> to
> > think about this some more. Perhaps a successor to CSL-JSON that accepts
> > arbitrary JSON objects wherever the old one accepts strings.
>
> In practice, a fairly small subset of LaTeX would be
> enough to handle most of what you find in bibtex
> bibliographies.
>
> Certainly you will find things like `\emph`, inline
> math, and lots of escape characters like `\"{a}`.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/yh480kk1keeazt.fsf%40johnmacfarlane.net
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 8233 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Experimental citeproc implementation in Rust
       [not found]                 ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-12-13  4:02                   ` Cormac Relf
       [not found]                     ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Cormac Relf @ 2018-12-13  4:02 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 8887 bytes --]

Thank you for this, it's instructive. I can't say I'm enthusiastic about 
starting from scratch on a parser, even a limited one.

After thinking some more, calling Lua's read function also won't really 
work. The Rust code still has to run outside a Lua context: in JSON mode, 
as a BibTeX <-> CSL-JSON <-> YAML metadata converter, and to fulfil the 
role pandoc-citeproc takes in things like vim-pandoc's citekey completion. 
These are all fairly non-negotiable features. (Also, because it is Rust 
code that would be parsing reference data, you would need a way back into 
the Lua VM, which is more complicated than just being Lua->Rust only.) It 
might actually be possible to emulate the current pandoc-citeproc approach 
(Text.CSL.Compat) of linking against Pandoc itself. There is some promising 
work (broken but fixable) over at https://github.com/mgattozzi/curryrs with 
linking the GHC runtime and simplifying interop using the type system(s), 
so it might be possible to just define some FFI-able Inline types in a 
little haskell wrapper library much like today's Text.CSL.Compat, and link 
it in. 2x link-time optimisations hopefully wouldn't even pull in that much 
of Pandoc.

In that way, users committed to pandoc output might even get to choose 
their input syntax through a metadata option; you could use the Pandoc HTML 
parser instead of micro-HTML, and even let people use Markdown or LaTeX in 
their Zotero entries and any emitted CSL-JSON. I think I would leave the 
citeproc-js micro-HTML syntax around for other output formats, because that 
means all of the pandoc baggage can still be stripped out for WebAssembly 
distribution.

I also think for ease of development I won't dive too hard into the Lua 
approach, because correctness doesn't depend on it, it can be added later, 
and all of the JSON interop is already functional. It also turns out 
there's another serde_json-supporting pandoc-types Rust crate that already 
has tree-walking here <https://github.com/oli-obk/pandoc-ast>, so I think 
that problem might be mostly solved.

On Thursday, December 13, 2018 at 8:07:17 AM UTC+11, Paulo Ney de Souza 
wrote:
>
> Here is a sample of elaborate titles that "really" happen in the wild -- 
> these are taken from the Proceedings of the ICM 2018:
>
>     title = {{$\bold Z$-theory: chasing ${\mathfrak m}/f$ theory}},
>     title = {New examples of complete Calabi--Yau metrics on 
> $\mathbb{C}^n$ for $n\ge 3$},
>     title = {Uniqueness of the group measure space decomposition for 
> {P}opa's {$\mathscr{HT}$} factors},
>     title = {Actions of {$\mathbb F_\infty$} whose {${\rm II}_1$} factors 
> and orbit equivalence relations have prescribed fundamental group},
>     title = {Cocycle and orbit superrigidity for lattices in {${\rm 
> SL}(n,\mathbb R)$} acting on homogeneous spaces},
>     title = {Profinite rigidity of $\mathbf{PGL}(2,{\Z}[\omega])$ and 
> $\mathbf{PSL}(2,{\Z}[\omega])$},
>     title = {Representation of measures with polynomial denseness in 
> {$L_p(\mathbb R,d\mu)$}, {$0<p<\infty$}, and its application to determinate 
> moment problems},
>     title = {Zimmer's conjecture for actions of 
> $\mathrm{SL}(m,\mathbb{Z})$},
>     title = {Can lattices in {${\rm SL}(n,\mathbb R)$} act on the circle?},
>     title = {Higher {T}eichm\"uller spaces: from {${\rm SL}(2,\mathbb R)$} 
> to other {L}ie groups},
>     title = {Exponential decay of connection probabilities for subcritical 
> Voronoi percolation in $\mathbb{R}^d$},
>     title = {A {KAM} scheme for {${\rm SL}(2,\mathbb R)$} cocycles with 
> {L}iouvillean frequencies},
>     title = {On dynamics of {$Out(F_n)$} on {$\mathrm{PSL}_2({\mathbb 
> C})$} characters},
>     title = {General topology meets model theory, on {$\mathfrak p$} and 
> {$\mathfrak t$}},^M
>     title = {New classes of {${\mathcal L}\sp{p}$}-spaces},
>     title = {A class of special {${\mathcal L}\sb{\infty }$}\ spaces},
>     title = {More {$\ell_r$} saturated {$\mathscr L^\infty$} spaces},
>     title = {The {${\mathcal L}\sb{p}$} spaces},
>     title = {A remark on bases in {${\mathcal L}\sb{p}$}-spaces with an 
> application to complementably universal {${\mathcal L}\sb{\infty 
> }$}-spaces}, 
>     title = {{${\rm SL}(2,\mathbb C)$} {C}hern-{S}imons theory and the 
> asymptotic behavior of the colored {J}ones polynomial},
>     title = {K-polystability of {${\mathbb Q}$}-{F}ano varieties admitting 
> {K}\"ahler-{E}instein metrics},
>     title = {Weak geodesic rays in the space of {K}\"ahler potentials and 
> the class {$\mathcal{E}(X,\omega)$}},
>     title = {Operator-algebraic superridigity for {${\rm SL}_n(\mathbb 
> Z)$}, {$n\geq 3$}},
>     title = {The space of closed subgroups of {$\mathbb R^n$} is 
> stratified and simply connected},
>     title = {The irreducible representations of the {L}ie algebra 
> {${\mathfrak s}{\mathfrak l}(2)$}\ and of the {W}eyl algebra},
>     title = {Singular {G}elfand-{T}setlin modules of 
> {${\mathfrak{gl}}(n)$}},
>     title = {Families of irreducible singular Gelfand-Tsetlin modules of 
> $\mathfrak{gl}(n)$},
>     title = {Infinite-dimensional representations of the {L}ie algebra 
> {$\mathfrak{gl}(n,{\mathbb C})$} related to complex analogs of the 
> {G}elfand-{T}setlin patterns and general hypergeometric functions on the 
> {L}ie group {${\rm GL}(n,{\mathbb C})$}},
>     title = {A geometric approach to 1-singular {G}elfand--{T}setlin 
> {$\mathfrak{gl}_n$}-modules},
>     title = {Geometric approach to $p$-singular Gelfand--Tsetlin 
> $\mathfrak {gl}_n$-modules},
>     title = {On some {B}ruhat decomposition and the structure of the 
> {H}ecke rings of {${\mathfrak p}$}-adic {C}hevalley groups},
>     title = {Stable $s$-minimal cones in $\mathbb{R}^3$ are flat for 
> $s\sim 1$},
>     title = {Delaunay type domains for an overdetermined elliptic problem 
> in {$\mathbb S^n\times\mathbb R$} and {$\Bbb H^n\times\Bbb R$}},
>     title = {A {KAM} scheme for {${\rm SL}(2,\mathbb R)$} cocycles with 
> {L}iouvillean frequencies},
>
>
> Paulo Ney
>
>
> On Wed, Dec 12, 2018 at 12:39 PM John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org 
> <javascript:>> wrote:
>
>> Cormac Relf <w...-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org <javascript:>> writes:
>>
>> > The BibTex parsing is a tricky one, though. There’s this 
>> > <https://github.com/charlesvdv/nom-bibtex> for the main syntax, at 
>> least. I 
>> > wouldn’t want to fork out to Pandoc for every single latex text field, 
>> but 
>> > maybe the Lua API’s read would help here.
>>
>> That's an interesting point.  If your parser just
>> parsed the fields as RawInline (Format "latex") ---,
>> you could have the lua filter do a separate pass at
>> the beginning to try to convert all of these into
>> native pandoc inlines using read.
>>
>> > It might be simpler to support 
>> > both citeproc-js’ micro-HTML and a similarly limited micro-LaTeX with a 
>> > simple Rust-based parser, but not at the same time. What do people use 
>> > backslash commands for in BibTeX? Are there names and document titles 
>> out 
>> > there that really need the whole power of LaTeX to render? I might have 
>> to 
>> > think about this some more. Perhaps a successor to CSL-JSON that 
>> accepts 
>> > arbitrary JSON objects wherever the old one accepts strings.
>>
>> In practice, a fairly small subset of LaTeX would be
>> enough to handle most of what you find in bibtex
>> bibliographies.
>>
>> Certainly you will find things like `\emph`, inline
>> math, and lots of escape characters like `\"{a}`.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/yh480kk1keeazt.fsf%40johnmacfarlane.net
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/786c8104-1297-465e-9cd9-d3c720e6685e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 11995 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Experimental citeproc implementation in Rust
       [not found]                     ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-17  4:37                       ` Cormac Relf
       [not found]                         ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Cormac Relf @ 2018-12-17  4:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1641 bytes --]

Update: Haskell is just so incredibly hard to link to, I don't think the 
FFI approach is good enough for now. If cabal ever implements the "type: 
native-static" option in the foreign-library stanza, then I think it could 
work, but until then, no. Without that, the options boil down to 
distributing thirty enormous dynamic libraries, or using a custom build of 
GHC with -fPIC, which is just too much work. I was optimistic but that was 
when I hadn't yet dived into the world of haskell tooling. It works fine 
for binaries, but practically nobody is linking to haskell libraries from 
non-haskell code, so nobody has really thought about how to make it easy.

As an alternative, I think I might strip down the pandoc-citeproc program 
to its library-reading parts, emit a sub- and super-set of CSL-JSON with 
Pandoc AST inlines instead of strings, and parse the resulting stdout from 
Rust. It wouldn't be an addition to the CSL-JSON spec, it would just come 
with a big honking warning saying that the output is unstable, unspecified 
and for internal use only.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6cea66b7-a6e3-438f-8000-9c8ed32e91f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2102 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Experimental citeproc implementation in Rust
       [not found]                         ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-02-08 12:47                           ` Cormac Relf
       [not found]                             ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Cormac Relf @ 2019-02-08 12:47 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 993 bytes --]

My current thinking here is that you could do CSL-JSON pre-processing in 
Lua, using the tiny json.lua <https://github.com/rxi/json.lua>. That would 
be better in many ways than distributing a binary wrapper around 
Text.Pandoc which would still probably weigh 20MB like pandoc-citeproc 
does. I tried it out here. 
<https://github.com/cormacrelf/citeproc-rs/tree/d9065738589c62dbb8fcebca90bca902b1996a1b/pandoc-preproc>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/41f8966a-f1da-4b7e-ac2e-b807f661af22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1428 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Experimental citeproc implementation in Rust
       [not found]                             ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-02-08 17:47                               ` John MacFarlane
  0 siblings, 0 replies; 9+ messages in thread
From: John MacFarlane @ 2019-02-08 17:47 UTC (permalink / raw)
  To: Cormac Relf, pandoc-discuss


If you just need json marshalling, then pandoc-types
should be all you need -- you shouldn't need full pandoc.
I don't know if that helps.

Cormac Relf <web-v7Sng7lNsVbsQp/K+IV0sw@public.gmane.org> writes:

> My current thinking here is that you could do CSL-JSON pre-processing in 
> Lua, using the tiny json.lua <https://github.com/rxi/json.lua>. That would 
> be better in many ways than distributing a binary wrapper around 
> Text.Pandoc which would still probably weigh 20MB like pandoc-citeproc 
> does. I tried it out here. 
> <https://github.com/cormacrelf/citeproc-rs/tree/d9065738589c62dbb8fcebca90bca902b1996a1b/pandoc-preproc>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/41f8966a-f1da-4b7e-ac2e-b807f661af22%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-02-08 17:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-11 17:04 Experimental citeproc implementation in Rust Cormac Relf
     [not found] ` <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-11 18:44   ` John MacFarlane
     [not found]     ` <yh480kh8fjj43g.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12  9:21       ` Cormac Relf
     [not found]         ` <9e7db31a-8244-4ac8-800b-25709cedc240-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-12 20:38           ` John MacFarlane
     [not found]             ` <yh480kk1keeazt.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-12-12 21:07               ` Paulo Ney de Souza
     [not found]                 ` <CAFVhNZOZuRTuWs9_0P0Rd4DM0udixT-WxOUaykvoz5vjmva71A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-12-13  4:02                   ` Cormac Relf
     [not found]                     ` <786c8104-1297-465e-9cd9-d3c720e6685e-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-17  4:37                       ` Cormac Relf
     [not found]                         ` <6cea66b7-a6e3-438f-8000-9c8ed32e91f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 12:47                           ` Cormac Relf
     [not found]                             ` <41f8966a-f1da-4b7e-ac2e-b807f661af22-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-02-08 17:47                               ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).