Separating out the parsing of bibliography and stylesheet from the use of same

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Separating out the parsing of bibliography and stylesheet from the use of same
@ 2022-08-26 18:05 Amy de Buitléir
       [not found] ` <32d808e8-3163-4d84-8158-9519db10f9c4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Amy de Buitléir @ 2022-08-26 18:05 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1376 bytes --]

My application is a bulk processor for a few hundred files. It uses the 
pandoc API to read them, passes them through a few filters, and then uses 
the pandoc API to write them.

All of the files use the same bibliography and csl. I would like to take 
advantage of this by:

1. parsing/compiling the bibliography and csl once, and then 
2. passing that as a parameter to (preferably pure) function to process 
each file.

Currently this functionality is bundled up in `processCitations`. As I 
understand it, `processCitations` gets the name of the bibliography file 
from the document metadata (which the pandoc app has augmented with the 
value of the `--bibliography` argument. It then parses the bibliography 
file, and uses that information to fill in the citations. And thus it needs 
to run in the IO monad.

Before I attempt to the rather complex job of creating a refactored version 
of `processCitations`, is there an easier way that I have overlooked?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/32d808e8-3163-4d84-8158-9519db10f9c4n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1789 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Separating out the parsing of bibliography and stylesheet from the use of same
       [not found] ` <32d808e8-3163-4d84-8158-9519db10f9c4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-08-26 19:31   ` John MacFarlane
       [not found]     ` <1DBE1FBB-F4EB-4AB5-A1F6-C85CF2A6E197-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: John MacFarlane @ 2022-08-26 19:31 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

> On Aug 26, 2022, at 11:05 AM, Amy de Buitléir <amy-x92Y4IBCQKU6Cx7ujrKbww@public.gmane.org> wrote:
> 
> My application is a bulk processor for a few hundred files. It uses the pandoc API to read them, passes them through a few filters, and then uses the pandoc API to write them.
>  
> All of the files use the same bibliography and csl. I would like to take advantage of this by:
> 
> 1. parsing/compiling the bibliography and csl once, and then 
> 2. passing that as a parameter to (preferably pure) function to process each file.
> 
> Currently this functionality is bundled up in `processCitations`. As I understand it, `processCitations` gets the name of the bibliography file from the document metadata (which the pandoc app has augmented with the value of the `--bibliography` argument. It then parses the bibliography file, and uses that information to fill in the citations. And thus it needs to run in the IO monad.
> 
> Before I attempt to the rather complex job of creating a refactored version of `processCitations`, is there an easier way that I have overlooked?
> 

No, but I like the idea of factoring out the core of processCitations so the style and references don’t need to be reparsed.  If you do this, consider submitting a PR that exports the refactored function.

But note:  processCitations does NOT need to be run in the IO monad. It can be used in any instance of the PandocMonad class, including PandocPure.  (In this case you can use addToFileTree to create an ersatz file system including the style file and any external bibliography file, then run processCitations in PandocPure, using modifyPureState to add your ersatz file tree in stFiles.)  This is in fact what I do in pandoc-server, which runs everything, including citation processing, in PandocPure to avoid any security risks.



-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1DBE1FBB-F4EB-4AB5-A1F6-C85CF2A6E197%40gmail.com.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Separating out the parsing of bibliography and stylesheet from the use of same
       [not found]     ` <1DBE1FBB-F4EB-4AB5-A1F6-C85CF2A6E197-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-08-26 20:08       ` Amy de Buitléir
       [not found]         ` <cff9fa85-a698-4089-8aa7-8ecfadfd303fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Amy de Buitléir @ 2022-08-26 20:08 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2589 bytes --]

I will definitely submit a PR if I do the refactoring.

I knew about runPure, but I didn't realise about the file tree thing. Thank 
you!

On Friday, 26 August 2022 at 20:32:10 UTC+1 fiddlosopher wrote:

> > On Aug 26, 2022, at 11:05 AM, Amy de Buitléir <a...-x92Y4IBCQKU6Cx7ujrKbww@public.gmane.org> 
> wrote:
> > 
> > My application is a bulk processor for a few hundred files. It uses the 
> pandoc API to read them, passes them through a few filters, and then uses 
> the pandoc API to write them.
> > 
> > All of the files use the same bibliography and csl. I would like to take 
> advantage of this by:
> > 
> > 1. parsing/compiling the bibliography and csl once, and then 
> > 2. passing that as a parameter to (preferably pure) function to process 
> each file.
> > 
> > Currently this functionality is bundled up in `processCitations`. As I 
> understand it, `processCitations` gets the name of the bibliography file 
> from the document metadata (which the pandoc app has augmented with the 
> value of the `--bibliography` argument. It then parses the bibliography 
> file, and uses that information to fill in the citations. And thus it needs 
> to run in the IO monad.
> > 
> > Before I attempt to the rather complex job of creating a refactored 
> version of `processCitations`, is there an easier way that I have 
> overlooked?
> > 
>
> No, but I like the idea of factoring out the core of processCitations so 
> the style and references don’t need to be reparsed. If you do this, 
> consider submitting a PR that exports the refactored function.
>
> But note: processCitations does NOT need to be run in the IO monad. It can 
> be used in any instance of the PandocMonad class, including PandocPure. (In 
> this case you can use addToFileTree to create an ersatz file system 
> including the style file and any external bibliography file, then run 
> processCitations in PandocPure, using modifyPureState to add your ersatz 
> file tree in stFiles.) This is in fact what I do in pandoc-server, which 
> runs everything, including citation processing, in PandocPure to avoid any 
> security risks.
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/cff9fa85-a698-4089-8aa7-8ecfadfd303fn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3167 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Separating out the parsing of bibliography and stylesheet from the use of same
       [not found]         ` <cff9fa85-a698-4089-8aa7-8ecfadfd303fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-08-27 13:19           ` Amy de Buitléir
       [not found]             ` <1ba691cb-bbce-40a7-a68c-b6b4f31b1e3en-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Amy de Buitléir @ 2022-08-27 13:19 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2960 bytes --]

In case this is useful to anyone, I created a simple example using the 
ersatz file tree approach. As far as I know, google groups doesn't support 
code formatting, so I put the code here 
<https://gist.github.com/mhwombat/b3bccbfed4f3edba4042b0d10953b784>.

On Friday, 26 August 2022 at 21:08:09 UTC+1 Amy de Buitléir wrote:

> I will definitely submit a PR if I do the refactoring.
>
> I knew about runPure, but I didn't realise about the file tree thing. 
> Thank you!
>
> On Friday, 26 August 2022 at 20:32:10 UTC+1 fiddlosopher wrote:
>
>> > On Aug 26, 2022, at 11:05 AM, Amy de Buitléir <a...@nualeargais.ie> 
>> wrote: 
>> > 
>> > My application is a bulk processor for a few hundred files. It uses the 
>> pandoc API to read them, passes them through a few filters, and then uses 
>> the pandoc API to write them. 
>> > 
>> > All of the files use the same bibliography and csl. I would like to 
>> take advantage of this by: 
>> > 
>> > 1. parsing/compiling the bibliography and csl once, and then 
>> > 2. passing that as a parameter to (preferably pure) function to process 
>> each file. 
>> > 
>> > Currently this functionality is bundled up in `processCitations`. As I 
>> understand it, `processCitations` gets the name of the bibliography file 
>> from the document metadata (which the pandoc app has augmented with the 
>> value of the `--bibliography` argument. It then parses the bibliography 
>> file, and uses that information to fill in the citations. And thus it needs 
>> to run in the IO monad. 
>> > 
>> > Before I attempt to the rather complex job of creating a refactored 
>> version of `processCitations`, is there an easier way that I have 
>> overlooked? 
>> > 
>>
>> No, but I like the idea of factoring out the core of processCitations so 
>> the style and references don’t need to be reparsed. If you do this, 
>> consider submitting a PR that exports the refactored function. 
>>
>> But note: processCitations does NOT need to be run in the IO monad. It 
>> can be used in any instance of the PandocMonad class, including PandocPure. 
>> (In this case you can use addToFileTree to create an ersatz file system 
>> including the style file and any external bibliography file, then run 
>> processCitations in PandocPure, using modifyPureState to add your ersatz 
>> file tree in stFiles.) This is in fact what I do in pandoc-server, which 
>> runs everything, including citation processing, in PandocPure to avoid any 
>> security risks. 
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1ba691cb-bbce-40a7-a68c-b6b4f31b1e3en%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3693 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Separating out the parsing of bibliography and stylesheet from the use of same
       [not found]             ` <1ba691cb-bbce-40a7-a68c-b6b4f31b1e3en-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-08-27 15:47               ` Amy de Buitléir
  0 siblings, 0 replies; 5+ messages in thread
From: Amy de Buitléir @ 2022-08-27 15:47 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3204 bytes --]

Forgot to mention, you also need to set the "csl" and "bibliography" 
variables in the document metadata.

On Saturday, 27 August 2022 at 14:19:44 UTC+1 Amy de Buitléir wrote:

> In case this is useful to anyone, I created a simple example using the 
> ersatz file tree approach. As far as I know, google groups doesn't support 
> code formatting, so I put the code here 
> <https://gist.github.com/mhwombat/b3bccbfed4f3edba4042b0d10953b784>.
>
> On Friday, 26 August 2022 at 21:08:09 UTC+1 Amy de Buitléir wrote:
>
>> I will definitely submit a PR if I do the refactoring.
>>
>> I knew about runPure, but I didn't realise about the file tree thing. 
>> Thank you!
>>
>> On Friday, 26 August 2022 at 20:32:10 UTC+1 fiddlosopher wrote:
>>
>>> > On Aug 26, 2022, at 11:05 AM, Amy de Buitléir <a...@nualeargais.ie> 
>>> wrote: 
>>> > 
>>> > My application is a bulk processor for a few hundred files. It uses 
>>> the pandoc API to read them, passes them through a few filters, and then 
>>> uses the pandoc API to write them. 
>>> > 
>>> > All of the files use the same bibliography and csl. I would like to 
>>> take advantage of this by: 
>>> > 
>>> > 1. parsing/compiling the bibliography and csl once, and then 
>>> > 2. passing that as a parameter to (preferably pure) function to 
>>> process each file. 
>>> > 
>>> > Currently this functionality is bundled up in `processCitations`. As I 
>>> understand it, `processCitations` gets the name of the bibliography file 
>>> from the document metadata (which the pandoc app has augmented with the 
>>> value of the `--bibliography` argument. It then parses the bibliography 
>>> file, and uses that information to fill in the citations. And thus it needs 
>>> to run in the IO monad. 
>>> > 
>>> > Before I attempt to the rather complex job of creating a refactored 
>>> version of `processCitations`, is there an easier way that I have 
>>> overlooked? 
>>> > 
>>>
>>> No, but I like the idea of factoring out the core of processCitations so 
>>> the style and references don’t need to be reparsed. If you do this, 
>>> consider submitting a PR that exports the refactored function. 
>>>
>>> But note: processCitations does NOT need to be run in the IO monad. It 
>>> can be used in any instance of the PandocMonad class, including PandocPure. 
>>> (In this case you can use addToFileTree to create an ersatz file system 
>>> including the style file and any external bibliography file, then run 
>>> processCitations in PandocPure, using modifyPureState to add your ersatz 
>>> file tree in stFiles.) This is in fact what I do in pandoc-server, which 
>>> runs everything, including citation processing, in PandocPure to avoid any 
>>> security risks. 
>>>
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5299dc10-d24f-4415-a254-4f664e1ee0f0n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4298 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-27 15:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-26 18:05 Separating out the parsing of bibliography and stylesheet from the use of same Amy de Buitléir
     [not found] ` <32d808e8-3163-4d84-8158-9519db10f9c4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-08-26 19:31   ` John MacFarlane
     [not found]     ` <1DBE1FBB-F4EB-4AB5-A1F6-C85CF2A6E197-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-08-26 20:08       ` Amy de Buitléir
     [not found]         ` <cff9fa85-a698-4089-8aa7-8ecfadfd303fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-08-27 13:19           ` Amy de Buitléir
     [not found]             ` <1ba691cb-bbce-40a7-a68c-b6b4f31b1e3en-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-08-27 15:47               ` Amy de Buitléir

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).