From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/25896 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: WIP: better citation processing Date: Sun, 16 Aug 2020 15:59:34 -0700 Message-ID: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="39667"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBY7U434QKGQEN5J57KQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Aug 17 00:59:51 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qt1-f184.google.com ([209.85.160.184]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1k7Rd1-000AFb-37 for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 17 Aug 2020 00:59:51 +0200 Original-Received: by mail-qt1-f184.google.com with SMTP id k1sf10874665qtp.20 for ; Sun, 16 Aug 2020 15:59:51 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1597618790; cv=pass; d=google.com; s=arc-20160816; b=mvUEcO59Wh7EEaUAEaQ9rouQ9YjIpnB7wYAnCLydYyHNqH7WWbPDhFxYUx/ljlQxZv fSQBcMgtyfwaHhBK3ita/1YBMXFKTuE4j3eWZrTalJbUjlwTq/t3EF/vP2w569EpDE6Q FarrA4vkH1exee0AZoCYdJaDPUUNc0iII15WP7/nD4Kmxn81IjhVaK+xhsL/Ai6+Xkvy 1bNg7ZfiQ9O1/tW66gGIYgE8X2yppNM9DFOdT+tFtgT8S4yGcJoOy1Uq1w1aC3eBaIDa Aw/P8EPtaCxgIRKMaL+h5G8nmvcg6MbL2/4kCWBwd0oGqPFYX4XRVKx8YcufyLxjlTwH CoOw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :date:subject:to:from:sender:dkim-signature; bh=1gf0tkPyaEeGZXt2L333e9ssj5niW5Mz2DHjOqmU+Ec=; b=M12nOdTZfg6fWMEeyTLvpfzUoJpn7foObpBZEE3Tc/RoRBxM3iaNk/g9bEeDWChK6y v4tuqJgAFpHLtaBjuyQfJXb6ASDw9Bxv5Mfgwcl5HO0ge6W69YITC+eBLh+aN4DJwUFZ u/asglrZXDcRd80/MZ0WDYYYxVFRKBBq7HUwacdVanZU1wvbgw0lrHcTDpIli4uHaDfG cBmLo/+zUaNpibbB9MWsznHHA0ilPixFUnorXXOYojcM+7af+2U5hGtzjB4vmj97O7a0 Ef5o9LjZamvdhQEoloRcAVyd2fngLQ71pRhHopyOs1wyGxth+weMAeHiLYDlNe5eF+f8 JDoA== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=WR58vR+x; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::535 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:subject:date:message-id:mime-version :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=1gf0tkPyaEeGZXt2L333e9ssj5niW5Mz2DHjOqmU+Ec=; b=i8IVmEk1+gzZqggA7thvbVRmx4j1LKMsWrOKZ5Ehd/NeYuvBqPR6AjKVoZp1nvaZZR iX3WgtnkDQdNdQR7zwJpKVNYjIFn6P01sY9oVVioYuWtFc3hfyBWPM/PjXujoQTA8op6 g90EpVZRw1h7PF7TdTl1hwb6hllzv3ixkRg2aNHCrOzCpMATLQWbKZnBjoXC+64GO19W hQ9JLYTeXzZOvFhgYiWs/wvkHDmBkTx7k6qRym5WsPBlm2NjN60nNnVBHydrNDJqSutq UcsoZbns66oLa9KAeVzeSWtByHMYXc/16G5HxWXfq1f76h5pMPrRrXxy4C1ZkGcKUXLx xhlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:subject:date:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:x-spam-checked-in-group :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=1gf0tkPyaEeGZXt2L333e9ssj5niW5Mz2DHjOqmU+Ec=; b=W7zQxED98NT9u7OyK7xduscshQizsEAEeDngpRVjeRDeTU+IfN1aFBuEnnl6GKFpWO Ab9FSYebF6a45CQxsirzXrP7btCwopfjo4Caoyv/N/l+J6Dyn9AvqE9f9wp5vaivQJLC 2+qmv9xYNA+APkTOTr/MuE06Olqeq7sl7Vi8wcMDcXCkaaitdJ8SrgN4/0Fceqn/BZuF 4Ii45bHkRlvGldmr46t7k7z39o0gC4yJ1HgLMr3OZWqDvuTCbpxN972JNJHTZyJL2sX4 K5tsZ2cjjg6pw3rdwI9wemi6bnQ8m7wJMwjK71j6dEbaUB/DvwDa4JZM3nRJpaW2ivOl LxJg== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM532BFXo5Gld8xsNCECqRwDIHSkUBa4JhTHFNwgOhFmH/rtjvkWaJ XGV/gbQESDuez4VaRrQgIg0= X-Google-Smtp-Source: ABdhPJyK4TOtVbRyRX08ZBjG/TA5T8GwoOJrzTkxJDzxZ5/z4ehtZjhpGqIfcXpSGF2+ZdIsg+3/Aw== X-Received: by 2002:ad4:576c:: with SMTP id r12mr12267059qvx.232.1597618789966; Sun, 16 Aug 2020 15:59:49 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a37:b087:: with SMTP id z129ls6877571qke.0.gmail; Sun, 16 Aug 2020 15:59:47 -0700 (PDT) X-Received: by 2002:a05:620a:2230:: with SMTP id n16mr10968289qkh.268.1597618787491; Sun, 16 Aug 2020 15:59:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597618787; cv=none; d=google.com; s=arc-20160816; b=wGTLZAag9FGf8CrMQmT2DCIgyenrYqZmNCrFK77hkC9qkUhUJm3oUpFXOrouPXyuxT xOUKddfuVWK+27QyPVSa6FCiPtrq2K+1wNBK7BCrPyvf1Eay7gkwPVZftvdj5xVIBYZj vYk0uJY8yUiMPOg6MNAx2y3mv4XOWBDm1ILEJet74Kr5l/ADUCb0oUfvJiQ26F6+LJBM sGlIm/KZ8RFyfvpdEFEFvNjul/U6YGBEZEVxhMgY2B3ul69EaolyEb2Kpq3kwqY4U+HF TH3mL+iXFGVhrRtyY2TM5j16yTOeRFVoWYwSZs1azZw1gdwNMN4VvRCR4T0rXXk8m8gA zsfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:date:subject:to:from:dkim-signature; bh=SmCiWhLgEcgiT85mw9Ex6WvZA4tsKAlVZnxoUOhAqSU=; b=hEYYX3vF1acEGtmCPz9tGm629HXy6AZxmLukZ85SN9nkE944bCTQYa2c/DWpKVX0Ph wxZOx0jvRz3nkuoXQCKgfs8UNetsW+6tqJpurSnEsDvVT867hA8Kz27hU5Nvs+zhHnPn ShuwX9d8RDvKsCkocLrEFkSRRpkQEa3uNhcb9vSiwpzd/idgQfDeWug1jnhN0utI54y5 nGpj1o2vgotRP2rq/cBC/RaSWQ47T4lch97k65UTCJc61Y/j73v0wEXKmiSSCjt6gQ/R bU6zjf8fKNVBAaGraWyyjf9SuzA9vkPCSJiCJhYx0px4duCiUwgffXVtpuYUL3/tgJpx OwdQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=WR58vR+x; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::535 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com. [2607:f8b0:4864:20::535]) by gmr-mx.google.com with ESMTPS id e16si708567qto.5.2020.08.16.15.59.47 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 16 Aug 2020 15:59:47 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::535 as permitted sender) client-ip=2607:f8b0:4864:20::535; Original-Received: by mail-pg1-x535.google.com with SMTP id 189so6567392pgg.13 for ; Sun, 16 Aug 2020 15:59:47 -0700 (PDT) X-Received: by 2002:a63:f04d:: with SMTP id s13mr8008878pgj.100.1597618786036; Sun, 16 Aug 2020 15:59:46 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id c2sm14414041pgb.52.2020.08.16.15.59.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Aug 2020 15:59:45 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 8A031A2A1; Sun, 16 Aug 2020 18:59:34 -0400 (EDT) X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=WR58vR+x; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::535 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:25896 Archived-At: I've been working this summer on a replacement to pandoc-citeproc. Part of this is a new library, citeproc, which is not specific to pandoc. This already passes a larger portion of the CSL test suite than pandoc-citeproc, and it should be straightforward to improve it further. It will be easier to maintain than pandoc-citeproc, more accurate, and faster. I've used this library to create a new filter, new-pandoc-citeproc, which now passes most of the pandoc-citeproc test suite but runs around 6 times faster. I'm inclined, though, not to release this as a new filter, but instead to depend on the citeproc library and build the citation processing capabilities into pandoc itself. This will cut down the binaries we need to distribute from two to one, and it will simplify things for users, who won't have to worry about filters. It will also be more performant, as we'll avoid the overhead of JSON serialization and deserialization. I'm not quite ready to release any of this code, but I hope to do so in the next month or two. This is just a teaser. The new library is pure Haskell and won't depend on bibutils (a wrapper around a C library). That means we'll only support bibtex/biblatex, pandoc yaml, and CSL JSON as bibliography formats. Those who need others will have to convert them using standalone bibutils. But support for other formats was never great, so I don't think this is a big loss.