public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Julia Diaz <julia.diaz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: What is the point of isBlockElement in the JATS reader?
Date: Fri, 19 May 2023 15:22:28 -0700 (PDT)	[thread overview]
Message-ID: <a71b20c8-7a6c-41ac-9af0-141b908111f7n@googlegroups.com> (raw)
In-Reply-To: <87cz2wo61j.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 2781 bytes --]


On Friday, 19 May 2023 at 07:32:51 UTC-5 Albert Krewinkel wrote:

The JATS reader is based on the DocBook reader, AFAIK, and reuses a good 
bit of the DocBook code. The list of block tags in the DocBook reader is 
much longer, so this is most likely a leftover than could be simplified.


Looks like legacy from DocBook indeed. 

I just realised something else: as the JATS reader is written now, the 
isBlockElement never returns TRUE. 
This is because the only function that calls isBlockElement is parseMixed, 
which is only used for the case of "p", which by definition of the JATS 
models cannot contain itself an inner "p" element 
<https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/p.html>. Thus 
the only case that could possibly trigger a TRUE result for isBlockElement 
is impossible.

In other words, as it is written now, not only the isBlockElement is 
pointless, also parseMixed is. Since isBlock is always FALSE, the rest is 
always empty 
<https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L207>, 
and lines 208-211 
<https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L208-L211> 
are never reached. So we could always in all confidence parse the full 
contents of "p" just with parseInLine as done here 
<https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L204>
. 

I would things something got mixed up in the process when the 
isBlockElement was adapted for the JATS reader. I could not help but notice 
that the order of the inLineTags in the isBlockElement function is almost 
identical, and in the exact same order, to the list of allowed contents of 
"p" in the JATS specification 
<https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/p.html> (only 
missing are a few more recent elements, mostly Q&A elements, that 
presumably did not exist when the JATS reader was first written). The 
paragraphLevel list is also an exact copy of the "Paragraph-level Display 
Elements" sublist in the same JATS specification page. It makes no sense to 
me to define these separately only to filter them out immediately and 
inevitably, specially when no record of which list the element in question 
belonged to, and only a context-less Boolean value is ever provided...

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a71b20c8-7a6c-41ac-9af0-141b908111f7n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3528 bytes --]

  parent reply	other threads:[~2023-05-19 22:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-18 23:47 Julia Diaz
     [not found] ` <af881a3e-4f9a-4644-87b5-aaf93e6e959an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-05-19  7:50   ` ThomasH
     [not found]     ` <e8dfa46a-1f69-4fa6-a412-6751e2250cacn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-05-19 12:27       ` Albert Krewinkel
     [not found]         ` <87cz2wo61j.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2023-05-19 22:22           ` Julia Diaz [this message]
     [not found]             ` <a71b20c8-7a6c-41ac-9af0-141b908111f7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-05-20 17:00               ` Julia Diaz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a71b20c8-7a6c-41ac-9af0-141b908111f7n@googlegroups.com \
    --to=julia.diaz-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).