From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/30446 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Julien Dutant Newsgroups: gmane.text.pandoc Subject: Re: Mixed Rawblock / Plain as Para in JATS output Date: Mon, 18 Apr 2022 07:02:24 -0700 (PDT) Message-ID: <633d80cc-6595-4998-8f30-f15765ef1265n@googlegroups.com> References: <8589f717-fe73-4813-9126-6beba194a3f1n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1032_1909494425.1650290544389" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27871"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBC5Y3356IYIPDXXVSIDBUBCID5N6O-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Apr 18 16:02:29 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oi1-f190.google.com ([209.85.167.190]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1ngRxU-00072w-NR for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 18 Apr 2022 16:02:28 +0200 Original-Received: by mail-oi1-f190.google.com with SMTP id l130-20020aca3e88000000b002ed0d592babsf4596644oia.12 for ; Mon, 18 Apr 2022 07:02:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=JbFOR74T0v3YD8DdODmAS1MHCntjOIikuHidYWp5gTg=; b=HWJ0s0LkMNsKn2UQbpV2i+Ta6OwdMZ0+TmjW6q+6htv2bw6MspjDRwT780d+9QL9wK RvL4/FXTNeXCOy9S3MD0JTI18FwEM2E4NxGvwc0kAzplTjMkSE7AptsZliTDIHhKB3Cu QgJ6iq8ipf8QCgN3XXnRmIpuv0HLGQNjiMGDOx+S9PMlJrw7DfmPBXJrCTEDOT57TnYx p4uYwIMscGjl+Raijy1lRGO4jDRSGfNCOJ1TmL1ojglWtpByU9faFjKzBJWVW/7uakE3 KIR8tf8rsCAI6YWADP319Y5c9u06d9Nz8Yp1oTCvIh17s6wbb5volqr+uA/N91pQd2P2 EPMg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=JbFOR74T0v3YD8DdODmAS1MHCntjOIikuHidYWp5gTg=; b=ntRvntQCT92ynu6SzMcK0uN1CQlIq1RJxlLJ+RUvPSzMWHZ6qGOdxgit1w4a+leda9 xKs2f6ZEAlbQrUv4/tDPMifWl8R7dMjvIEH8sj96LfLXkuhZYoISsK9M3ZcoFa1TrUQi za3Ghyw2kPGjNxsZ8zvxJ+EsLc6I3E0Oea6f3zZNCe2Kwlr8Ws/jGVlpNN7S3aI2VQcH 9jSYvYepkihY1xP8pw6d1Okk6bPT4cKresxf5K8P+dnAgG9CShfBBeUsk3ImPzFuNQ8c 2RJJtHYbYWMOGqDIPTnwTz9iQwC4VmuHHMViAFeJftaSfAMJzEU2jJDiD8+wh4nn48GF b6+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=JbFOR74T0v3YD8DdODmAS1MHCntjOIikuHidYWp5gTg=; b=QRAF2RApqEw8NMAzmF71zgA6yQ5crIrMW5QOaeWJo8vzdQUoh1wxyfDTCyQlPpNxC+ tR2ED57dorm0jDYBImNPDsHE2nsEGBNdAq+evLook8N+ypOla2d1nhnXfTE9t8nLDxSd aURFT308KSA3/0VuGJRNdIeunahVYaR0X3ZsamUPWrC97ZMYkXjHA1eUQBTniWP7DHj1 65BwJ+w800f5rtXFlzTD0F74c0wJfpmEqJHQTkqEtYbXjBzB8O1WVeggxfkETHST1qmR tQq/DAkPi9DrvzjQ18FGRUdc5MQxuRDTEjXYP/h584UVk0XCMxEEnpPkHIeQxv9il6pZ lpBQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM530cg/JsCvhoq5j9u/6jIjWEu3U7aaucfL7AfQi1KS8Gb7I3owXK nn6o+epbCCXlTZ9+LaW4+JM= X-Google-Smtp-Source: ABdhPJzzLIK+/dtFXLiBv2JeafVLvLbUViJa57oGOBzROIoMY/Nzjw1rZ5YsCCBKWVl4BSQSluJ+lw== X-Received: by 2002:a05:6808:f93:b0:2f9:bdbd:4b18 with SMTP id o19-20020a0568080f9300b002f9bdbd4b18mr7090054oiw.275.1650290547599; Mon, 18 Apr 2022 07:02:27 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6808:23ca:b0:322:ac8e:f9e9 with SMTP id bq10-20020a05680823ca00b00322ac8ef9e9ls476084oib.4.gmail; Mon, 18 Apr 2022 07:02:25 -0700 (PDT) X-Received: by 2002:a05:6808:11c9:b0:2f9:b01b:17f0 with SMTP id p9-20020a05680811c900b002f9b01b17f0mr5125414oiv.233.1650290545088; Mon, 18 Apr 2022 07:02:25 -0700 (PDT) In-Reply-To: X-Original-Sender: julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:30446 Archived-At: ------=_Part_1032_1909494425.1650290544389 Content-Type: multipart/alternative; boundary="----=_Part_1033_91884624.1650290544389" ------=_Part_1033_91884624.1650290544389 Content-Type: text/plain; charset="UTF-8" Just noticed that Bastien Dumont faced a same issue writing a filter for ODT output: https://github.com/jgm/pandoc/issues/7262 . On Monday, April 18, 2022 at 2:15:06 PM UTC+1 Julien Dutant wrote: > And related question, shouldn't the WriterOptions Lua constructor include > a `standalone` field? > > On Monday, April 18, 2022 at 2:10:12 PM UTC+1 Julien Dutant wrote: > >> Hi all, >> >> I'm writing a new Lua filter to handle theorems and the like in various >> format including JATS (https://github.com/jdutant/statement). Given the >> following markdown and a suitable bibliography: >> >> ```markdown >> ::: theorem >> (from @article) Some very interesting fact holds. >> ::: >> ``` >> >> The filter is trying to generate a JATS statement ( >> https://jats.nlm.nih.gov/archiving/tag-library/1.1/element/statement.html >> ): >> ```xml >> >> >> from Doe, J (2003) >>

Some very interesting fact holds.

>>
>> ``` >> >> The problem I encounter is that the JATS writer converts pandoc.Plain to >>

blocks. So if I build inlines list: >> inlines = { pandoc.RawInline('jats',''), ... more inlines ..., >> pandoc.RawInline('jats','<title>')} >> >> and try to insert it in the document with: >> blocks:insert( pandoc.Plain(inlines) ) >> >> I get the unintended output: >> ```xml >> <p><title>from Doe, J (2003)

>> ``` >> >> Now I could of course stringify the title inlines first, add the title >> tags and insert the result in a RawBlock. But that's bad too, because the >> title inlines may contain things that aren't yet to be stringified like a >> citation. >> >> Is it necessary for the JATS writer to turn Plain elements into

ones? >> The HTML one doesn't, after all. >> >> At the moment my best approach is to use `pandoc.write` (thanks so much >> Albert for giving us this!) to convert the inlines on the spot. There are >> some issues with this though. >> >> * If something in the inlines needs to be handled by another filter it >> won't be handled properly. For instance, a pandoc-crossref crossreference >> will be turned into plain text and unretrievable. AFAIK my filter can't >> tell which other filters are run and can't pass them as WriterOptions >> anyway. >> * If I don't pass PANDOC_WRITER_OPTIONS, pandoc.write won't know which >> citation_mode to use, and may lack other relevant settings to write the >> inlines. >> * If I pass all of PANDOC_WRITER_OPTIONS, pandoc.write uses standalone >> mode if the document is in standalone mode, so I get a whole preamble >> within my label. Yet there is no standalone setting in WriterOptions, and I >> can unset it (I've tried to remove `template` but it didn't work). >> * So my best guess so far is to create a new WriterOptions by copying >> only those fields of PANDOC_WRITER_OPTIONS that might be useful to format >> the inlines. >> >> All in all, it's pretty heavy handed just to handle a label of inlines. >> Is there a better approach? Any chance that the JATS writer converts Plain >> blocks to plain blocks without

tags? >> >> Best, >> J >> >> -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/633d80cc-6595-4998-8f30-f15765ef1265n%40googlegroups.com. ------=_Part_1033_91884624.1650290544389 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Just noticed that Bastien Dumont faced a same issue writing a filter for OD= T output: https://github.com/jgm/pandoc/issues/7262 . 

On Monday, A= pril 18, 2022 at 2:15:06 PM UTC+1 Julien Dutant wrote:
And related question, shouldn'= ;t the WriterOptions Lua constructor include a `standalone` field?=C2=A0
On M= onday, April 18, 2022 at 2:10:12 PM UTC+1 Julien Dutant wrote:
Hi all,

I'm= writing a new Lua filter to handle theorems and the like in various format= including JATS (https://g= ithub.com/jdutant/statement). Given the following markdown and a suitab= le bibliography:

```markdown
::: theorem=
(from @article) Some very interesting fact holds.
:::<= /div>
```

The filter is trying to generate a J= ATS statement ( https://jat= s.nlm.nih.gov/archiving/tag-library/1.1/element/statement.html ):
=
```xml
<statement>
<label>Theorem 1.1&= lt;/label>
<title>from Doe, J (2003)</title>
=
<p>Some very interesting fact holds.</p>
</st= atement>
```

The problem I encounter = is that the JATS writer converts pandoc.Plain to <p> blocks. So if I = build inlines list:
inlines =3D { pandoc.RawInline('jats'= ,'<title>'), ... more inlines ..., pandoc.RawInline('jats= ','<title>')}

and try to insert = it in the document with:
blocks:insert( pandoc.Plain(inlines) )

I get the unintended output:
```xml
=
<p><title>from Doe, J (2003)</title><p>
```

Now I could of course stringify the ti= tle inlines first, add the title tags and insert the result in a RawBlock. = But that's bad too, because the title inlines may contain things that a= ren't yet to be stringified like a citation.=C2=A0

=
Is it necessary for the JATS writer to turn Plain elements into <p&= gt; ones? The HTML one doesn't, after all.

At = the moment my best approach is to use `pandoc.write` (thanks so much Albert= for giving us this!) to convert the inlines on the spot. There are some is= sues with this though.

* If something in the inlin= es needs to be handled by another filter it won't be handled properly. = For instance, a pandoc-crossref crossreference will be turned into plain te= xt and unretrievable. AFAIK my filter can't tell which other filters ar= e run and can't pass them as WriterOptions anyway.
* If I don= 't pass PANDOC_WRITER_OPTIONS, pandoc.write won't know which citati= on_mode to use, and may lack other relevant settings to write the inlines.<= /div>
* If I pass all of PANDOC_WRITER_OPTIONS, pandoc.write uses stand= alone mode if the document is in standalone mode, so I get a whole preamble= within my label. Yet there is no standalone setting in WriterOptions, and = I can unset it (I've tried to remove `template` but it didn't work)= .
* So my best guess so far is to create a new WriterOptions by c= opying only those fields of=C2=A0 PANDOC_WRITER_OPTIONS that might be usefu= l to format the inlines.

All in all, it's pret= ty heavy handed just to handle a label of inlines. Is there a better approa= ch? Any chance that the JATS writer converts Plain blocks to plain blocks w= ithout <p> tags?

Best,
J

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/633d80cc-6595-4998-8f30-f15765ef1265n%40googlegroups.= com.
------=_Part_1033_91884624.1650290544389-- ------=_Part_1032_1909494425.1650290544389--