public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
To: Joseph Reagle
	<joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: WIP: better citation processing
Date: Fri, 21 Aug 2020 12:06:33 -0700	[thread overview]
Message-ID: <m2v9hbbyyu.fsf@johnmacfarlane.net> (raw)
In-Reply-To: <e94c6f9d-e6d8-099a-4bf2-7aed30476a6c-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>

Joseph Reagle <joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org> writes:

> On 8/16/20 6:59 PM, John MacFarlane wrote:
>> I've used this library to create a new filter,
>> new-pandoc-citeproc, which now passes most of the pandoc-citeproc
>> test suite but runs around 6 times faster.
>
> Awesome! I've topped 10k items in my YAML file. Even though you kindly sped up some citeproc bottlenecks for me in the past, I continue to "subset" the 10k file into document-specific YAML files when building a document. That is, it's faster for me to regex my document for citations and pull their entries out of the 10k YAML file before handing it off to pandoc than just using pandoc itself.

There are two separate issues here:

1.  Parsing the YAML metadata (this is done in pandoc's markdown reader)
2.  Processing the CSL (this is done by pandoc-citeproc)

The new library will speed up 2, but it won't affect 1, and I
suspect 1 is the bottleneck for you.

For #1, we have this open issue:

https://github.com/jgm/pandoc/issues/6084

The issue could be fixed by switching back from the pure Haskell
HsYaml to the wrapped C library yaml.  I hate to do that, though,
because I've been trying to remove all pure C library
dependencies from pandoc (both for security reasons and
because they don't work with e.g. compiling to JavaScript
with ghcjs).

Here's a workaround that should work right now.  Reading CSL JSON
is fast, so you could try using pandoc-citeproc -j to convert
your YAML bibliography to CSL, then refer to the CSL bibliography
in your pandoc metadata.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2v9hbbyyu.fsf%40johnmacfarlane.net.


  parent reply	other threads:[~2020-08-21 19:06 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-16 22:59 John MacFarlane
2020-08-17 14:55 ` OT: " Anton Shepelev
     [not found] ` <m2mu2udwo9.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-08-17 14:31   ` James
     [not found]     ` <6F6F5A78-7473-473E-927F-46E2382FE979-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2020-08-17 23:17       ` Priv.-Doz. Dr. Maria Shinoto
2020-08-18  9:58   ` Denis Maier
     [not found]     ` <360dbe5b-f1ef-17f6-32e6-8c9f85204844-cl+VPiYnx/1AfugRpC6u6w@public.gmane.org>
2020-08-18 15:39       ` jcr
     [not found]         ` <46e97135-ea1f-469a-898a-eb24876c0708o-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-08-18 16:59           ` Denis Maier
     [not found]             ` <1ee49c20-ee12-4984-abdf-e00a9e4414e6o-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-08-18 18:13               ` FI Apps
     [not found]                 ` <D2ED4685-A7ED-4BB4-B651-D362B2564F62-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2020-08-18 21:14                   ` Denis Maier
     [not found]                     ` <d59896f4-b360-9ef4-e751-d9dc912ee700-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2020-08-19  8:37                       ` FI Apps
2020-08-19 12:47               ` James P. Ascher
     [not found]                 ` <1edca1a5-bb73-42b7-a61a-b02d2a1ec5dco-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-08-19 13:12                   ` Denis Maier
     [not found]                     ` <5d4b596a-b3ef-744c-bbae-564e490eb598-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2020-08-19 13:37                       ` James P. Ascher
2020-08-21 19:41       ` John MacFarlane
     [not found]         ` <m2mu2nbxcv.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-08-22  9:33           ` Denis Maier
     [not found]             ` <865f327e-d305-4409-dd6f-1659f6a884cc-cl+VPiYnx/1AfugRpC6u6w@public.gmane.org>
2020-09-08 18:15               ` John MacFarlane
     [not found]                 ` <m2bligt9ra.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-09-09  7:51                   ` Gabriel Nützi
2020-09-09  9:10                   ` FI Apps
     [not found]                     ` <CAGOSsd=Xs6u6Xup9YugR-zd6FC_29QFs1asEcARA4m2UPts_vQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-09-11 15:12                       ` John MacFarlane
     [not found]                         ` <m2tuw4jqj1.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-09-11 16:02                           ` Albert Krewinkel
     [not found]                             ` <87imckl2tg.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-09-11 16:24                               ` John MacFarlane
     [not found]                                 ` <m2blicjn7p.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-09-11 16:46                                   ` Albert Krewinkel
     [not found]                                     ` <87ft7ol0rq.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-09-11 18:51                                       ` proposed --citeproc option John MacFarlane
     [not found]                                         ` <m21rj8jge2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-09-12 15:56                                           ` BPJ
2020-09-11 20:31                                       ` WIP: better citation processing Denis Maier
     [not found]                                         ` <1adfb0c1-d745-a6dc-bf14-f790e3f1280f-cl+VPiYnx/1AfugRpC6u6w@public.gmane.org>
2020-09-11 20:58                                           ` John MacFarlane
     [not found]                                             ` <m2v9gkhvya.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-09-11 21:14                                               ` Denis Maier
     [not found]                                                 ` <64087dc0-cf6d-4c8d-c579-9b4312baec2e-cl+VPiYnx/1AfugRpC6u6w@public.gmane.org>
2020-09-11 21:21                                                   ` Denis Maier
2020-09-17 17:18                                                   ` John MacFarlane
     [not found]                                                     ` <d7317035-fba6-466b-8d6e-699f82ae5445n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-09-17 17:30                                                       ` Denis Maier
     [not found]                                                         ` <m27dssjnvu.fsf@MacBook-Pro.hsd1.ca.comcast.net>
     [not found]                                                           ` <m24knwjnm5.fsf@MacBook-Pro.hsd1.ca.comcast.net>
     [not found]                                                             ` <86c0bfc6-4795-718c-5ddd-0e3ef0f91403@mailbox.org>
     [not found]                                                               ` <86c0bfc6-4795-718c-5ddd-0e3ef0f91403-cl+VPiYnx/1AfugRpC6u6w@public.gmane.org>
2020-09-17 19:20                                                                 ` Denis Maier
2020-09-12  6:45                                               ` Denis Maier
2020-09-12  6:51                                               ` Denis Maier
2020-09-12  6:26                       ` 'Nick Bart' via pandoc-discuss
2020-08-20 12:29   ` Joseph Reagle
     [not found]     ` <e94c6f9d-e6d8-099a-4bf2-7aed30476a6c-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2020-08-21 19:06       ` John MacFarlane [this message]
     [not found]         ` <m2v9hbbyyu.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-08-21 19:12           ` John MacFarlane
2020-08-21 21:21           ` Joseph Reagle
     [not found]             ` <91c2cfed-7211-4194-72cf-5c01abf2315c-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2020-08-21 23:03               ` John MacFarlane
2021-06-16 18:14           ` Joseph
     [not found]             ` <30abf74d-df6f-4df8-aaee-a493331d4e92n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-06-16 19:33               ` John MacFarlane
     [not found]                 ` <m24kdx38qp.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-06-17 14:01                   ` Joseph Reagle
2020-09-24 16:10   ` WIP: better citation processing - nightlies now available for testing! John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2v9hbbyyu.fsf@johnmacfarlane.net \
    --to=jgm-tvlzxgkolnx2fbvcvol8/a@public.gmane.org \
    --cc=joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).