From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/25951 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: WIP: better citation processing Date: Fri, 21 Aug 2020 12:06:33 -0700 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23543"; mail-complaints-to="usenet@ciao.gmane.io" To: Joseph Reagle , pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBR5WQD5AKGQETYNYGSQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Aug 21 21:06:53 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-vk1-f192.google.com ([209.85.221.192]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1k9CNI-00063c-GK for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 21 Aug 2020 21:06:52 +0200 Original-Received: by mail-vk1-f192.google.com with SMTP id i15sf898169vke.23 for ; Fri, 21 Aug 2020 12:06:52 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1598036811; cv=pass; d=google.com; s=arc-20160816; b=LWk9X85kzL3mdldOxgoZ0GHxO8ytmwQ0+Gjb2nYq5hvgWPh//hFMztguDz03UrvXV0 67Nb5DUCIQHsZkzibAhmdVXc+xKSy6U/A4zYjdZFGEML9qQCU2Bj1eYF8mw/Gq+glOEN AEifDyl2jJTWIPc8DCuKa0FRfBrGdGYJG+WXF4Wji/PQcUoR4e6fVBMOejLX6Pn21iDC I6p6I8VIygpWpnzui0t8yl3V/MPYe82npOJfV/K33b5NwrV9qt6zyHqVZci+X/Wa4lcD fpoQ4IdsLsidyVXzAVllgNlf/4ulRSnuRTX1SDMi+ltwUPY9Zhu+dv0e6kg8ZIjKyKDH 0z0A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:content-transfer-encoding:reply-to :mime-version:message-id:date:references:in-reply-to:subject:to:from :sender:dkim-signature; bh=cUaAVq0ih5SyAQ9vxRaYojgyKNLGojVH4nhKqrEAKhg=; b=GDYGPHSL8ILa3oiB/C+UisrZiCCSigab96WjYmAEBoTAQOtTsNqN6uOUvqgEnbfGlP W1NdmMhPqnPbCEHHA+vpa+fYL+dQSb7xI1n3k9dEZS9SEHXtDRzjYRIFf7bH8RoJe3Um B3+WmPHz7UbXLlk5y75B45k7AxXkG+nJPYy9HBUDTTKX1yZJzuZdlakEhP3PLSubVrym qQF2+/6ip1KWXuFo3CzjgoE3DUmUUvCM016dGzDYvNOumrkvQgk3U6XwZ5YzFHSfi+sS f8kVnuuvc/FpKH8HSZY5txf58c6Zr8t4Wnizd6zjgij4hzTHReFwLqofVImk+fwjHiCf mG8g== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=b6ZQLwoK; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::1031 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:subject:in-reply-to:references:date:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:content-transfer-encoding:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=cUaAVq0ih5SyAQ9vxRaYojgyKNLGojVH4nhKqrEAKhg=; b=gYGD4YWJaypN9xjo/F5WlxCVwQRsDfx0CzRUgqiVugcJJr7CxSn4oSR7uuMNS60SYR KwoozjBG9lMuJHqiG0vdMy7QXjwm36YP6UCfBE6oJGBUyfAQW7uNdAyLxSGhwMFevr/m FxHpaJSbFS68x6QEomK9Xy1UqiFUC1CPn/wNSxI43OiNyhD/q4hx63fsS7JBuVuG1lix TTCwAEai3V7fwHrvQpxcjLza88ouW7hUOekUmHEghIOPep9Kv88MwG5cXz8FNYJZeFCj TEpw5hqnxA5wVsBe9Ubn74I9kMTAX/wPer2NVekmsgI5A08WMwfbbW+DYcy5ARbMmocm adoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:subject:in-reply-to:references :date:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to :content-transfer-encoding:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=cUaAVq0ih5SyAQ9vxRaYojgyKNLGojVH4nhKqrEAKhg=; b=SS5r7u1EQo0pylM3l3TaHIRc4rpYiN/bDq9LE+RwNWhCXdaOv1O0Oojd47PATjkIqI aqs7e+t2fyN3vfyfQRInoBJPlBO2NqTdYM4URglXUmCpPX63EWrKfiPRUWHAJfDDp68A k89QHuxq6nYjdKL6vpnGgQKTl+PXbJLdApGN2YoKa0PcBucNu9czFFU00m94aIckIkqu y26YWi20M7jvAl/xEK9C/NtOhC+oXc046URP6ItNzozcpe09HBfQujbiNBBc96CqHENl K5xF4gdbIYX1VX1bTDYlDXxtiXHQGjWU8W8FZAtDtRdvqI5Yf9rpAmElA+mGVE6p Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM531XtWZMzz8Tatforbr2LR7jx3qP51OAV4mXa61y7ZQ7zzi1kHkT FYhwxTu0BRym2cQp7hXvkNA= X-Google-Smtp-Source: ABdhPJzG7Z+dHoIwX/jB+HssFtFgIa2IjonAqsjbI6XgVlpKPpZSbUE8v8ncG4lFkyyq9q97vVCm7Q== X-Received: by 2002:ab0:6797:: with SMTP id v23mr2633642uar.35.1598036811501; Fri, 21 Aug 2020 12:06:51 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:ab0:1d84:: with SMTP id l4ls215387uak.9.gmail; Fri, 21 Aug 2020 12:06:47 -0700 (PDT) X-Received: by 2002:ab0:69d6:: with SMTP id u22mr2594464uaq.65.1598036807702; Fri, 21 Aug 2020 12:06:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598036807; cv=none; d=google.com; s=arc-20160816; b=har0n2XO80foSaY6d8OV46uiyLYGfceWzF9A/cA4G1sAbIwvJ1Tv+0ihYRJSdqbCqZ ii697q4JtRdRMEgDNcHfIJlLmjYnYfbEzsuz3hL5fGVq+M6BltEePMLwAbupBouFcMjC m6LppBHkPSuxC4a4MiuKsf2uPEC7FwtWfJfaO4Hme2cAqAkYSO3texXiR7ObClpY7WGB VhfXH+LXdg6E7cnGwIcP1zfv5QIly4gA5ip7UooEbhjCipZRa3hYELyBWFdpiyZm1gnP LEQXDKO+MkE42FSPa/ERFBTn2TQUS9OUB2zJmxaEYrQ4buH8YJB9+PSHtFtuRtSoqLI8 AyyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :dkim-signature; bh=aPlTEzj3XpMisR628c3U2WXlZ3l+gl3ohTJ+YnGIwsY=; b=ZlN5pmrEyn1a/mblBjLpOhFa1OdoZCgAcmGsp3MeICsHpvA7IhAUs0k+VNS1G+0Pg1 tGcu+4XpZLdMLPtoZj2fwDVMDk6v9+X5tX+T2K8WGxZ7Vf+XnMcRDXXIO99+r9OOBYz7 bPhVyQeSd49Z+bicdbKvU0JcA6Hzc0E5ljn+b26to6BDRA2QkoABB2KXfZO6mNmm9vma b7W3ZYTtBFk8WwNNbFGijt7V0vy33liLAtZKN5MueagN1CaEMzYXT4Hjm5vEXglk9qBn aWoEk5ZJq9qhNYpwt9vGnnvqIfHp5rifIsy0JlchjnU5SNWd/EkAI9ZsZSx7X0dtT+VW kgag== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=b6ZQLwoK; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::1031 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com. [2607:f8b0:4864:20::1031]) by gmr-mx.google.com with ESMTPS id y3si222151vke.2.2020.08.21.12.06.47 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Aug 2020 12:06:47 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::1031 as permitted sender) client-ip=2607:f8b0:4864:20::1031; Original-Received: by mail-pj1-x1031.google.com with SMTP id q93so1264535pjq.0 for ; Fri, 21 Aug 2020 12:06:47 -0700 (PDT) X-Received: by 2002:a17:90a:eb0f:: with SMTP id j15mr3466779pjz.26.1598036806230; Fri, 21 Aug 2020 12:06:46 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id gm8sm2492937pjb.13.2020.08.21.12.06.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Aug 2020 12:06:45 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 89436A2A1; Fri, 21 Aug 2020 15:06:34 -0400 (EDT) In-Reply-To: X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=b6ZQLwoK; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::1031 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:25951 Archived-At: Joseph Reagle writes: > On 8/16/20 6:59 PM, John MacFarlane wrote: >> I've used this library to create a new filter, >> new-pandoc-citeproc, which now passes most of the pandoc-citeproc >> test suite but runs around 6 times faster. > > Awesome! I've topped 10k items in my YAML file. Even though you kindly sp= ed up some citeproc bottlenecks for me in the past, I continue to "subset" = the 10k file into document-specific YAML files when building a document. Th= at is, it's faster for me to regex my document for citations and pull their= entries out of the 10k YAML file before handing it off to pandoc than just= using pandoc itself. There are two separate issues here: 1. Parsing the YAML metadata (this is done in pandoc's markdown reader) 2. Processing the CSL (this is done by pandoc-citeproc) The new library will speed up 2, but it won't affect 1, and I suspect 1 is the bottleneck for you. For #1, we have this open issue: https://github.com/jgm/pandoc/issues/6084 The issue could be fixed by switching back from the pure Haskell HsYaml to the wrapped C library yaml. I hate to do that, though, because I've been trying to remove all pure C library dependencies from pandoc (both for security reasons and because they don't work with e.g. compiling to JavaScript with ghcjs). Here's a workaround that should work right now. Reading CSL JSON is fast, so you could try using pandoc-citeproc -j to convert your YAML bibliography to CSL, then refer to the CSL bibliography in your pandoc metadata. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/m2v9hbbyyu.fsf%40johnmacfarlane.net.