From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/21652 Path: news.gmane.org!.POSTED!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: Experimental citeproc implementation in Rust Date: Tue, 11 Dec 2018 10:44:19 -0800 Message-ID: References: <78b7f42d-7640-45ff-a359-f59355217af8@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Trace: blaine.gmane.org 1544553748 13512 195.159.176.226 (11 Dec 2018 18:42:28 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 11 Dec 2018 18:42:28 +0000 (UTC) To: Cormac Relf , pandoc-discuss Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBEELYDQAKGQEHI36I4Q-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Dec 11 19:42:24 2018 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-yw1-f56.google.com ([209.85.161.56]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gWmz8-0003Ou-TU for gtp-pandoc-discuss@m.gmane.org; Tue, 11 Dec 2018 19:42:23 +0100 Original-Received: by mail-yw1-f56.google.com with SMTP id x14sf9199391ywg.18 for ; Tue, 11 Dec 2018 10:44:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1544553873; cv=pass; d=google.com; s=arc-20160816; b=Gn4VI8pSy9REwp2XKd7CwK98cEPoeUXMJaEgHO3QVm4SXlpENsBIJtahVTU7N/AFmw Qym3V9GuzDK9VgRxJhxy7W6PkDZJlYVVd98c+rFZpO98XR6xvYz431U8aapm/HdCIXMi j7xbPQ32mpU6/FADVkm0QFOs016EjJu5JI0f2H/tW08ggYMN+eZUqmTceN0GMeb/CECZ DgLQ5ogwyTwGxdD40XYzKwiz2+5HzIx7QS7h5h0cLvZnoOEhi3UApDHNQZbjOiEtnclu WegXWmIqAH/FUJ+zbwGJ+IYV4BnyGE7l7WHq4VdtPhe9hdfxxoFXiKJz0bwQhj/SLHCk TpSQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :date:references:in-reply-to:subject:to:from:sender:dkim-signature; bh=wRz75iE6jI8PVGjlYGrbl7dTeOJpYT7YwpYl4O+7cqU=; b=EJ8m03ylyC74YiZMWTuJzxJRLHwmSzMDP3anbN67BtagqDWMA7uPlMKUgqqxPdT9PS H4wzKTYekD/u4G0GAN31upVepLCPcEy98z1ir1fv0mvXAM7GK9y/YrIkKTQ4xnd2lxTw fGbU3seRVkeMJFhJuH3MFO/s91Pq+RRG7FNEJ+XW7Dg98uqHMPWIiwjJdB5v1k5iTnwy BUu7MKmEDMKgNA9BsN3SQ3AmP7292G3EnJPTanAlOhZr4he/RGCDLMQXISFiCvfHgTTH zbG0jQ6MFbeoVRXKV+9LO9x3iqmhOW37QD2NWTXu6eM2H6wzoUZhJPuJO9Rh/gSvXPUt gt6A== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=ahN3xJsc; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::436 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:subject:in-reply-to:references:date:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=wRz75iE6jI8PVGjlYGrbl7dTeOJpYT7YwpYl4O+7cqU=; b=N49wxm3XWOqPkWHlI15aqbbJ1TnxAZvLMbUtGnzJxpDVvpAECM8H3V0i8RBcWXUxiY R1N8HoUTjaOarDinCORIDPSR8Qg0APRqXShCgtRU9WmJmevxW0vD2OMZEIb/pOC/MTq2 9x8ftZ78WlZNCmLgMf+juWwrCSB3UJ48DxMnLiG8ntW/l7RbE1+M/j4L297dcLmjMLE1 sgfJVol/VoMOvLPObPWN/cH93O+pyfX4/y8b8m8Xc2iSZNZXv7e1twzqFEoZlJFbMXh7 9reT8l0yoVEvm62I8e5GJ7UU2AsMW8IXLKn5NJJ9Et8Fb3aXUmBcnTbDpYFkEQg0ZoNG qmbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:subject:in-reply-to:references :date:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=wRz75iE6jI8PVGjlYGrbl7dTeOJpYT7YwpYl4O+7cqU=; b=rDGJRt+AYMCtVYoQJZLuKIxApYT3M4qiHOjtau2tuyrvKDZ81R8prCCs9JwSbMNmux I8ZKRE7ZdASSY0kLV1ZhcE8+kw5yhoVMy/QcXV3BmrhNoYu3BzLWU9t3Ydnz8Pw7Hd// hn7ogA6kz3bH6r6rZZD1WdUIQs8M5zYcCCDyxCCfUIsmFEVu6OAE5v6sZhfnuhNLF+MJ a+8xC1Q5bx1r1dt6DldsjChGzeUeKnedorImg+UR+M/E8azoWwXySHt1kEwsY0RHgwT/ dU9Uv/kHNym9Z0jNm98iTZ52jvpI8xiK2VqsrYKRxVhlXPrhNOc51Nb/dgjWy2wBWbVV ZcJA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AA+aEWbMqWQIQ5ysIJ9z6Nm6y42HTc2R6afOu8W+EvORL2zaWG5KjnOL JKPyjmJy6dB+OKZ/WpCW5S0= X-Google-Smtp-Source: AFSGD/Xqi4wGGo46WtXSNJbXkvXeM2Lkq6braRAFIhgjHt2cyGlNkxpBDLlQpsU4OlbJTpfPt9wchw== X-Received: by 2002:a25:3b83:: with SMTP id i125mr160806yba.1.1544553873548; Tue, 11 Dec 2018 10:44:33 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a25:a304:: with SMTP id d4-v6ls5412035ybi.13.gmail; Tue, 11 Dec 2018 10:44:32 -0800 (PST) X-Received: by 2002:a25:a347:: with SMTP id d65-v6mr9253061ybi.16.1544553872522; Tue, 11 Dec 2018 10:44:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544553872; cv=none; d=google.com; s=arc-20160816; b=ZaUglYTzTCqOerW2/vMAYgc9+uPENEL1QlBK3ntgbdf/XLSwez6MY+SNX4aOO6R2oK ZTnl/IObB1TtUU2ndDkGqs3klsnXaqUFqvHoq35mtcgDhYPpSeAdmNvXBbjVzS+ysAiF saz2CwbYe8AHnTG+zqq9o029qtKzXNPaH8do48N17OdwpZ1/VhqAfCyIcvFmxwUacclW jhu5370F+nUsn2wUYYuesKtFxCRmO2gGMs1+2VekF7zUyV7nPNgRxaMj/btb3o/2kgZD f8adbtuuGC7/DeyV41w90ucroq0HKctuLFg99CkDWgxr7MW6Rw7uK7H8ChSQ05FeehRo m8AA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :dkim-signature; bh=8jQNH0FrqAp0WUXcQDxM/kMQ93riB8T25kgFmOybn0M=; b=Ckzgp+VX4kF20a6WFaNNirmTSaj5fCueAf4q+A7T+Zo+dlzUK7Q4I7QYO0iKYTO0dB mTXq3QFVdrKwmKSS9rKZt382QjwORKN28U7BPCgaAx/Yhx3G0kMECkrcFPPLDBijlJ01 a53QYI9IBJIyLQFRNXpaI9qdGQCdcqRUZboxN2sh0iroAcD3DmM+oUhnCBqzTpWY56wR Q8NB/YyqJhhkV8FGi/xt8L0yA0sskiB957AkgByJkyuibqOcJEolCLn5iNynxSUlE1sp Q14sntPgVeEA2HrNTEz+WvKHhfDV9gsHqcbniPhuxPsY4SdB2tYNDnHiSwwplyrsS6rC 9brg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=ahN3xJsc; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::436 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com. [2607:f8b0:4864:20::436]) by gmr-mx.google.com with ESMTPS id f188-v6si661425yba.5.2018.12.11.10.44.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Dec 2018 10:44:32 -0800 (PST) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::436 as permitted sender) client-ip=2607:f8b0:4864:20::436; Original-Received: by mail-pf1-x436.google.com with SMTP id c72so7551538pfc.6 for ; Tue, 11 Dec 2018 10:44:32 -0800 (PST) X-Received: by 2002:a63:5761:: with SMTP id h33mr15501495pgm.283.1544553871546; Tue, 11 Dec 2018 10:44:31 -0800 (PST) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id 125sm19681130pfg.39.2018.12.11.10.44.29 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 11 Dec 2018 10:44:30 -0800 (PST) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 37C98A15F; Tue, 11 Dec 2018 13:44:19 -0500 (EST) In-Reply-To: <78b7f42d-7640-45ff-a359-f59355217af8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=ahN3xJsc; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::436 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Spam-Checked-In-Group: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:21652 Archived-At: That's an interesting idea. pandoc-citeproc is still pretty crufty, and it doesn't always behave like citeproc-js, so I can see the point of this. The difficulties are that - pandoc-citeproc is currently quite tightly integrated with pandoc; it operates on the pandoc AST. So as you note, that capability would have to be reproduced somehow in citeproc-rs. I think that the tree-walking work could be given to a lua filter that either called out to citeproc-rs or linked to a version of it. (I don't think luajit is required for this; one can write lua modules in C, so it should be possible to do it in rust.) But citeproc-rs would still have to be able to handle pandoc JSON. Perhaps that could just be the underlying format it operates on (it would have to replace the current HTML-ish syntax used in citeproc-js, and maybe it would have to be made more expressive). - One potential problem is that citeproc-rs would need to change, sometimes, when pandoc does. Currently that's not a problem since I maintain pandoc-citeproc. - pandoc-citeproc does some things citeproc-js does not do (these are, strictly speaking, extensions to standard citeproc). For example, author-in-text citations, citation prefixes and suffixes, proper handling of math (that's actually just folded into general pandoc support), movement of punctuation, conversion from bibtex/biblatex and other formats. Note that conversion from bibtex relies on pandoc's latex parser; to reproduce this functionality, you'd have to write a latex parser in rust or somehow call out to pandoc. Best, John Cormac Relf writes: > Hi, > > I've been working on https://github.com/cormacrelf/citeproc-rs, an > experimental new CSL and CSL-M citation processor written in Rust. The one > tracking issue gives a rough overview of how early this is in development. > t can't do name blocks yet, let alone disambiguation or structured > bibliographies, but there are promising foundations. The coolest feature so > far is the error reporting at parse time. Try running it on a style with > errors like . IContributions or support would > be welcome. > > I'm raising it here because there's an interesting possibility that could > come out of it, that touches the Pandoc. platform. > > - It could *replace citeproc-js* by compiling to WebAssembly that would > run in Zotero, browsers and Node. > - This is one good reason to use Rust, which has excellent WASM > tooling. I have nothing against Haskell or working on pandoc-citeproc > directly, but Haskell WASM support is just not there yet. > - It could *feasibly also replace pandoc-citeproc*, and in fact can > already build some pandoc JSON output. > - It could feasibly *also* replace almost *every other citeproc* by > exposing a native static library on every target the Rust/LLVM ecosystem > supports. That could be wrapped in e.g. PHP, Ruby, Python, and Java, which > all have FFI support. It's weird to me that nobody has built a > lingua-franca native library yet, given how complex the specification is. > It's a similar situation to libxml2 or libgit2: big, complex, but > solve-once-use-everywhere. > > That's one ring to rule them all, all in a single codebase, fewer competing > implementations, more uniform output across CSL tools and less work for the > community on both bugfixing and CSL evolution. There are also long-standing > bugs in pandoc-citeproc and citeproc-js that I'm aiming to fix in the > process, alongside some reworking of the less-complete or less-thought-out > extended features like citeproc-js' abbreviations or the fairly hacky and > rigid author suppression in both pandoc and citeproc-js. > > The second point on that evil plan, replacing pandoc-citeproc, is a bit > tricky, and might need a bit of thinking through, given that: > > - Using FFI from a Haskell pandoc-citeproc that handles the Pandoc parts > is a bit... I don't know. > - Imagine: pandoc-citeproc deserializes a big JSON document, walks > it, parses [@doe, 31] syntax, collects a bunch of cites (with cite IDs > attached) and then FFIs out the rest of the job, attaching pandoc JSON to > the relevant points at the other end. There would be quite a lot of weird > conversions and serialization in this, because Text.Pandoc.Definition > doesn't and shouldn't provide a C ABI-compatible memory layout, but it > might work. > - You could replace the entire pandoc-citeproc JSON filter with a new > binary, but the Lua API exists for a reason. Maybe if there's a bunch of > work going on, avoiding double-JSON should be one of the goals. Is that > something that should be written with a Lua FFI wrapper around citeproc-rs > (i.e. the libciteproc static library it builds)? Setting aside the tricky > problems with how to return owned datastructures over FFI without leaking > memory, FFI is only available with LuaJIT, which as I understand it would > have to become a system dependency for Pandoc through an hslua constraint > that has not been specified in official Pandoc builds so far. In the > alternative, it wouldn't be too hard to maintain a JSON filter for > non-LuaJIT installs, but it sure would be confusing for users to have two > ways for different platforms or configurations. Maybe JSON is good enough, > and maybe serde_json is so fast it won't matter in the end. It would > certainly be much simpler. > - pandoc-citeproc includes syntax parsing that kinda defines part of > Pandoc Markdown (i.e. [@doe, 33]), so that would be moving further out of > tree than it already is. There is a good parser combinator library, at > least (nom), that could replicate the Parsec code in a way that's fairly > comprehensible by Haskell developers. Some of the more advanced > display/formatting features of CSL also need support from Pandoc output > templates to work correctly. Are we okay with all of that? > > If anyone has any input on these interop problems, I'd love to hear it. At > the moment, it looks like the way forward is to replace the pandoc-citeproc > binary wholesale, speaking JSON and taking on all the pandoc-specific > features in Rust. > > Cormac > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78b7f42d-7640-45ff-a359-f59355217af8%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout.