On Friday, 19 May 2023 at 07:32:51 UTC-5 Albert Krewinkel wrote:
The JATS reader is based on the DocBook reader, AFAIK, and reuses a good
bit of the DocBook code. The list of block tags in the DocBook reader is
much longer, so this is most likely a leftover than could be simplified.
Looks like legacy from DocBook indeed.
I just realised something else: as the JATS reader is written now, the isBlockElement never returns TRUE.
In other words, as it is written now, not only the isBlockElement is pointless, also parseMixed is. Since isBlock is always FALSE,
the rest is always empty, and
lines 208-211 are never reached. So we could always in all confidence parse the full contents of "p" just with parseInLine as done
here.
I would things something got mixed up in the process when the isBlockElement was adapted for the JATS reader. I could not help but notice that the order of the inLineTags in the isBlockElement function is almost identical, and in the exact same order, to the list of allowed contents of "p"
in the JATS specification (only missing are a few more recent elements, mostly Q&A elements, that presumably did not exist when the JATS reader was first written). The paragraphLevel list is also an exact copy of the "
Paragraph-level Display Elements" sublist in the same JATS specification page. It makes no sense to me to define these separately only to filter them out immediately and inevitably, specially when no record of which list the element in question belonged to, and only a context-less Boolean value is ever provided...