public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Mixed Rawblock / Plain as Para in JATS output
@ 2022-04-18 13:10 Julien Dutant
       [not found] ` <8589f717-fe73-4813-9126-6beba194a3f1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Julien Dutant @ 2022-04-18 13:10 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2975 bytes --]

Hi all,

I'm writing a new Lua filter to handle theorems and the like in various 
format including JATS (https://github.com/jdutant/statement). Given the 
following markdown and a suitable bibliography:

```markdown
::: theorem
(from @article) Some very interesting fact holds.
:::
```

The filter is trying to generate a JATS statement ( 
https://jats.nlm.nih.gov/archiving/tag-library/1.1/element/statement.html ):
```xml
<statement>
<label>Theorem 1.1</label>
<title>from Doe, J (2003)</title>
<p>Some very interesting fact holds.</p>
</statement>
```

The problem I encounter is that the JATS writer converts pandoc.Plain to 
<p> blocks. So if I build inlines list:
inlines = { pandoc.RawInline('jats','<title>'), ... more inlines ..., 
pandoc.RawInline('jats','<title>')}

and try to insert it in the document with:
blocks:insert( pandoc.Plain(inlines) )

I get the unintended output:
```xml
<p><title>from Doe, J (2003)</title><p>
```

Now I could of course stringify the title inlines first, add the title tags 
and insert the result in a RawBlock. But that's bad too, because the title 
inlines may contain things that aren't yet to be stringified like a 
citation. 

Is it necessary for the JATS writer to turn Plain elements into <p> ones? 
The HTML one doesn't, after all.

At the moment my best approach is to use `pandoc.write` (thanks so much 
Albert for giving us this!) to convert the inlines on the spot. There are 
some issues with this though.

* If something in the inlines needs to be handled by another filter it 
won't be handled properly. For instance, a pandoc-crossref crossreference 
will be turned into plain text and unretrievable. AFAIK my filter can't 
tell which other filters are run and can't pass them as WriterOptions 
anyway.
* If I don't pass PANDOC_WRITER_OPTIONS, pandoc.write won't know which 
citation_mode to use, and may lack other relevant settings to write the 
inlines.
* If I pass all of PANDOC_WRITER_OPTIONS, pandoc.write uses standalone mode 
if the document is in standalone mode, so I get a whole preamble within my 
label. Yet there is no standalone setting in WriterOptions, and I can unset 
it (I've tried to remove `template` but it didn't work).
* So my best guess so far is to create a new WriterOptions by copying only 
those fields of  PANDOC_WRITER_OPTIONS that might be useful to format the 
inlines.

All in all, it's pretty heavy handed just to handle a label of inlines. Is 
there a better approach? Any chance that the JATS writer converts Plain 
blocks to plain blocks without <p> tags?

Best,
J

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8589f717-fe73-4813-9126-6beba194a3f1n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3855 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mixed Rawblock / Plain as Para in JATS output
       [not found] ` <8589f717-fe73-4813-9126-6beba194a3f1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-04-18 13:15   ` Julien Dutant
       [not found]     ` <e40cf1d2-9f51-442b-8aa5-28905c6e77b6n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Julien Dutant @ 2022-04-18 13:15 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3275 bytes --]

And related question, shouldn't the WriterOptions Lua constructor include a 
`standalone` field? 

On Monday, April 18, 2022 at 2:10:12 PM UTC+1 Julien Dutant wrote:

> Hi all,
>
> I'm writing a new Lua filter to handle theorems and the like in various 
> format including JATS (https://github.com/jdutant/statement). Given the 
> following markdown and a suitable bibliography:
>
> ```markdown
> ::: theorem
> (from @article) Some very interesting fact holds.
> :::
> ```
>
> The filter is trying to generate a JATS statement ( 
> https://jats.nlm.nih.gov/archiving/tag-library/1.1/element/statement.html 
> ):
> ```xml
> <statement>
> <label>Theorem 1.1</label>
> <title>from Doe, J (2003)</title>
> <p>Some very interesting fact holds.</p>
> </statement>
> ```
>
> The problem I encounter is that the JATS writer converts pandoc.Plain to 
> <p> blocks. So if I build inlines list:
> inlines = { pandoc.RawInline('jats','<title>'), ... more inlines ..., 
> pandoc.RawInline('jats','<title>')}
>
> and try to insert it in the document with:
> blocks:insert( pandoc.Plain(inlines) )
>
> I get the unintended output:
> ```xml
> <p><title>from Doe, J (2003)</title><p>
> ```
>
> Now I could of course stringify the title inlines first, add the title 
> tags and insert the result in a RawBlock. But that's bad too, because the 
> title inlines may contain things that aren't yet to be stringified like a 
> citation. 
>
> Is it necessary for the JATS writer to turn Plain elements into <p> ones? 
> The HTML one doesn't, after all.
>
> At the moment my best approach is to use `pandoc.write` (thanks so much 
> Albert for giving us this!) to convert the inlines on the spot. There are 
> some issues with this though.
>
> * If something in the inlines needs to be handled by another filter it 
> won't be handled properly. For instance, a pandoc-crossref crossreference 
> will be turned into plain text and unretrievable. AFAIK my filter can't 
> tell which other filters are run and can't pass them as WriterOptions 
> anyway.
> * If I don't pass PANDOC_WRITER_OPTIONS, pandoc.write won't know which 
> citation_mode to use, and may lack other relevant settings to write the 
> inlines.
> * If I pass all of PANDOC_WRITER_OPTIONS, pandoc.write uses standalone 
> mode if the document is in standalone mode, so I get a whole preamble 
> within my label. Yet there is no standalone setting in WriterOptions, and I 
> can unset it (I've tried to remove `template` but it didn't work).
> * So my best guess so far is to create a new WriterOptions by copying only 
> those fields of  PANDOC_WRITER_OPTIONS that might be useful to format the 
> inlines.
>
> All in all, it's pretty heavy handed just to handle a label of inlines. Is 
> there a better approach? Any chance that the JATS writer converts Plain 
> blocks to plain blocks without <p> tags?
>
> Best,
> J
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e40cf1d2-9f51-442b-8aa5-28905c6e77b6n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4909 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mixed Rawblock / Plain as Para in JATS output
       [not found]     ` <e40cf1d2-9f51-442b-8aa5-28905c6e77b6n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-04-18 14:02       ` Julien Dutant
  2022-04-18 14:03       ` Bastien DUMONT
  1 sibling, 0 replies; 5+ messages in thread
From: Julien Dutant @ 2022-04-18 14:02 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3555 bytes --]

Just noticed that Bastien Dumont faced a same issue writing a filter for 
ODT output: https://github.com/jgm/pandoc/issues/7262 . 

On Monday, April 18, 2022 at 2:15:06 PM UTC+1 Julien Dutant wrote:

> And related question, shouldn't the WriterOptions Lua constructor include 
> a `standalone` field? 
>
> On Monday, April 18, 2022 at 2:10:12 PM UTC+1 Julien Dutant wrote:
>
>> Hi all,
>>
>> I'm writing a new Lua filter to handle theorems and the like in various 
>> format including JATS (https://github.com/jdutant/statement). Given the 
>> following markdown and a suitable bibliography:
>>
>> ```markdown
>> ::: theorem
>> (from @article) Some very interesting fact holds.
>> :::
>> ```
>>
>> The filter is trying to generate a JATS statement ( 
>> https://jats.nlm.nih.gov/archiving/tag-library/1.1/element/statement.html 
>> ):
>> ```xml
>> <statement>
>> <label>Theorem 1.1</label>
>> <title>from Doe, J (2003)</title>
>> <p>Some very interesting fact holds.</p>
>> </statement>
>> ```
>>
>> The problem I encounter is that the JATS writer converts pandoc.Plain to 
>> <p> blocks. So if I build inlines list:
>> inlines = { pandoc.RawInline('jats','<title>'), ... more inlines ..., 
>> pandoc.RawInline('jats','<title>')}
>>
>> and try to insert it in the document with:
>> blocks:insert( pandoc.Plain(inlines) )
>>
>> I get the unintended output:
>> ```xml
>> <p><title>from Doe, J (2003)</title><p>
>> ```
>>
>> Now I could of course stringify the title inlines first, add the title 
>> tags and insert the result in a RawBlock. But that's bad too, because the 
>> title inlines may contain things that aren't yet to be stringified like a 
>> citation. 
>>
>> Is it necessary for the JATS writer to turn Plain elements into <p> ones? 
>> The HTML one doesn't, after all.
>>
>> At the moment my best approach is to use `pandoc.write` (thanks so much 
>> Albert for giving us this!) to convert the inlines on the spot. There are 
>> some issues with this though.
>>
>> * If something in the inlines needs to be handled by another filter it 
>> won't be handled properly. For instance, a pandoc-crossref crossreference 
>> will be turned into plain text and unretrievable. AFAIK my filter can't 
>> tell which other filters are run and can't pass them as WriterOptions 
>> anyway.
>> * If I don't pass PANDOC_WRITER_OPTIONS, pandoc.write won't know which 
>> citation_mode to use, and may lack other relevant settings to write the 
>> inlines.
>> * If I pass all of PANDOC_WRITER_OPTIONS, pandoc.write uses standalone 
>> mode if the document is in standalone mode, so I get a whole preamble 
>> within my label. Yet there is no standalone setting in WriterOptions, and I 
>> can unset it (I've tried to remove `template` but it didn't work).
>> * So my best guess so far is to create a new WriterOptions by copying 
>> only those fields of  PANDOC_WRITER_OPTIONS that might be useful to format 
>> the inlines.
>>
>> All in all, it's pretty heavy handed just to handle a label of inlines. 
>> Is there a better approach? Any chance that the JATS writer converts Plain 
>> blocks to plain blocks without <p> tags?
>>
>> Best,
>> J
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/633d80cc-6595-4998-8f30-f15765ef1265n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5326 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mixed Rawblock / Plain as Para in JATS output
       [not found]     ` <e40cf1d2-9f51-442b-8aa5-28905c6e77b6n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2022-04-18 14:02       ` Julien Dutant
@ 2022-04-18 14:03       ` Bastien DUMONT
  2022-04-18 15:56         ` Julien Dutant
  1 sibling, 1 reply; 5+ messages in thread
From: Bastien DUMONT @ 2022-04-18 14:03 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hi! You may be interested by the following discussion: https://github.com/jgm/pandoc/issues/7262, which was originally about DOCX but has become more general. (Sadly, it is not even possible to use pandoc.write the achieve the desired result for DOCX.)

Le Monday 18 April 2022 à 06:15:06AM, Julien Dutant a écrit :
> And related question, shouldn't the WriterOptions Lua constructor include a
> `standalone` field? 
> 
> On Monday, April 18, 2022 at 2:10:12 PM UTC+1 Julien Dutant wrote:
> 
>     Hi all,
> 
>     I'm writing a new Lua filter to handle theorems and the like in various
>     format including JATS ([1]https://github.com/jdutant/statement). Given the
>     following markdown and a suitable bibliography:
> 
>     ```markdown
>     ::: theorem
>     (from @article) Some very interesting fact holds.
>     :::
>     ```
> 
>     The filter is trying to generate a JATS statement ( [2]https://
>     jats.nlm.nih.gov/archiving/tag-library/1.1/element/statement.html ):
>     ```xml
>     <statement>
>     <label>Theorem 1.1</label>
>     <title>from Doe, J (2003)</title>
>     <p>Some very interesting fact holds.</p>
>     </statement>
>     ```
> 
>     The problem I encounter is that the JATS writer converts pandoc.Plain to
>     <p> blocks. So if I build inlines list:
>     inlines = { pandoc.RawInline('jats','<title>'), ... more inlines ...,
>     pandoc.RawInline('jats','<title>')}
> 
>     and try to insert it in the document with:
>     blocks:insert( pandoc.Plain(inlines) )
> 
>     I get the unintended output:
>     ```xml
>     <p><title>from Doe, J (2003)</title><p>
>     ```
> 
>     Now I could of course stringify the title inlines first, add the title tags
>     and insert the result in a RawBlock. But that's bad too, because the title
>     inlines may contain things that aren't yet to be stringified like a
>     citation. 
> 
>     Is it necessary for the JATS writer to turn Plain elements into <p> ones?
>     The HTML one doesn't, after all.
> 
>     At the moment my best approach is to use `pandoc.write` (thanks so much
>     Albert for giving us this!) to convert the inlines on the spot. There are
>     some issues with this though.
> 
>     * If something in the inlines needs to be handled by another filter it
>     won't be handled properly. For instance, a pandoc-crossref crossreference
>     will be turned into plain text and unretrievable. AFAIK my filter can't
>     tell which other filters are run and can't pass them as WriterOptions
>     anyway.
>     * If I don't pass PANDOC_WRITER_OPTIONS, pandoc.write won't know which
>     citation_mode to use, and may lack other relevant settings to write the
>     inlines.
>     * If I pass all of PANDOC_WRITER_OPTIONS, pandoc.write uses standalone mode
>     if the document is in standalone mode, so I get a whole preamble within my
>     label. Yet there is no standalone setting in WriterOptions, and I can unset
>     it (I've tried to remove `template` but it didn't work).
>     * So my best guess so far is to create a new WriterOptions by copying only
>     those fields of  PANDOC_WRITER_OPTIONS that might be useful to format the
>     inlines.
> 
>     All in all, it's pretty heavy handed just to handle a label of inlines. Is
>     there a better approach? Any chance that the JATS writer converts Plain
>     blocks to plain blocks without <p> tags?
> 
>     Best,
>     J
> 
> 
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [3]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [4]https://groups.google.com/d/msgid/
> pandoc-discuss/e40cf1d2-9f51-442b-8aa5-28905c6e77b6n%40googlegroups.com.
> 
> References:
> 
> [1] https://github.com/jdutant/statement
> [2] https://jats.nlm.nih.gov/archiving/tag-library/1.1/element/statement.html
> [3] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [4] https://groups.google.com/d/msgid/pandoc-discuss/e40cf1d2-9f51-442b-8aa5-28905c6e77b6n%40googlegroups.com?utm_medium=email&utm_source=footer

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Yl1vl42VOMXANVA3%40localhost.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mixed Rawblock / Plain as Para in JATS output
  2022-04-18 14:03       ` Bastien DUMONT
@ 2022-04-18 15:56         ` Julien Dutant
  0 siblings, 0 replies; 5+ messages in thread
From: Julien Dutant @ 2022-04-18 15:56 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4848 bytes --]

Thanks Bastien - indeed, I posted there now.

On Monday, April 18, 2022 at 3:03:10 PM UTC+1 Bastien Dumont wrote:

> Hi! You may be interested by the following discussion: 
> https://github.com/jgm/pandoc/issues/7262, which was originally about 
> DOCX but has become more general. (Sadly, it is not even possible to use 
> pandoc.write the achieve the desired result for DOCX.)
>
> Le Monday 18 April 2022 à 06:15:06AM, Julien Dutant a écrit :
> > And related question, shouldn't the WriterOptions Lua constructor 
> include a
> > `standalone` field? 
> > 
> > On Monday, April 18, 2022 at 2:10:12 PM UTC+1 Julien Dutant wrote:
> > 
> > Hi all,
> > 
> > I'm writing a new Lua filter to handle theorems and the like in various
> > format including JATS ([1]https://github.com/jdutant/statement). Given 
> the
> > following markdown and a suitable bibliography:
> > 
> > ```markdown
> > ::: theorem
> > (from @article) Some very interesting fact holds.
> > :::
> > ```
> > 
> > The filter is trying to generate a JATS statement ( [2]https://
> > jats.nlm.nih.gov/archiving/tag-library/1.1/element/statement.html ):
> > ```xml
> > <statement>
> > <label>Theorem 1.1</label>
> > <title>from Doe, J (2003)</title>
> > <p>Some very interesting fact holds.</p>
> > </statement>
> > ```
> > 
> > The problem I encounter is that the JATS writer converts pandoc.Plain to
> > <p> blocks. So if I build inlines list:
> > inlines = { pandoc.RawInline('jats','<title>'), ... more inlines ...,
> > pandoc.RawInline('jats','<title>')}
> > 
> > and try to insert it in the document with:
> > blocks:insert( pandoc.Plain(inlines) )
> > 
> > I get the unintended output:
> > ```xml
> > <p><title>from Doe, J (2003)</title><p>
> > ```
> > 
> > Now I could of course stringify the title inlines first, add the title 
> tags
> > and insert the result in a RawBlock. But that's bad too, because the 
> title
> > inlines may contain things that aren't yet to be stringified like a
> > citation. 
> > 
> > Is it necessary for the JATS writer to turn Plain elements into <p> ones?
> > The HTML one doesn't, after all.
> > 
> > At the moment my best approach is to use `pandoc.write` (thanks so much
> > Albert for giving us this!) to convert the inlines on the spot. There are
> > some issues with this though.
> > 
> > * If something in the inlines needs to be handled by another filter it
> > won't be handled properly. For instance, a pandoc-crossref crossreference
> > will be turned into plain text and unretrievable. AFAIK my filter can't
> > tell which other filters are run and can't pass them as WriterOptions
> > anyway.
> > * If I don't pass PANDOC_WRITER_OPTIONS, pandoc.write won't know which
> > citation_mode to use, and may lack other relevant settings to write the
> > inlines.
> > * If I pass all of PANDOC_WRITER_OPTIONS, pandoc.write uses standalone 
> mode
> > if the document is in standalone mode, so I get a whole preamble within 
> my
> > label. Yet there is no standalone setting in WriterOptions, and I can 
> unset
> > it (I've tried to remove `template` but it didn't work).
> > * So my best guess so far is to create a new WriterOptions by copying 
> only
> > those fields of  PANDOC_WRITER_OPTIONS that might be useful to format the
> > inlines.
> > 
> > All in all, it's pretty heavy handed just to handle a label of inlines. 
> Is
> > there a better approach? Any chance that the JATS writer converts Plain
> > blocks to plain blocks without <p> tags?
> > 
> > Best,
> > J
> > 
> > 
> > --
> > You received this message because you are subscribed to the Google Groups
> > "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email
> > to [3]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit [4]
> https://groups.google.com/d/msgid/
> > pandoc-discuss/e40cf1d2-9f51-442b-8aa5-28905c6e77b6n%40googlegroups.com.
> > 
> > References:
> > 
> > [1] https://github.com/jdutant/statement
> > [2] 
> https://jats.nlm.nih.gov/archiving/tag-library/1.1/element/statement.html
> > [3] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > [4] 
> https://groups.google.com/d/msgid/pandoc-discuss/e40cf1d2-9f51-442b-8aa5-28905c6e77b6n%40googlegroups.com?utm_medium=email&utm_source=footer
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b83a8624-342c-4452-8c97-a7e12c884f76n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 8653 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-18 15:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-18 13:10 Mixed Rawblock / Plain as Para in JATS output Julien Dutant
     [not found] ` <8589f717-fe73-4813-9126-6beba194a3f1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-04-18 13:15   ` Julien Dutant
     [not found]     ` <e40cf1d2-9f51-442b-8aa5-28905c6e77b6n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-04-18 14:02       ` Julien Dutant
2022-04-18 14:03       ` Bastien DUMONT
2022-04-18 15:56         ` Julien Dutant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).