public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* What is the point of isBlockElement in the JATS reader?
@ 2023-05-18 23:47 Julia Diaz
       [not found] ` <af881a3e-4f9a-4644-87b5-aaf93e6e959an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Julia Diaz @ 2023-05-18 23:47 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 720 bytes --]

I just realised this function produces TRUE if the name of the element is 
"p", false otherwise. Why do we need a 126 lines 
<https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L106-L132> 
to do the same as: 

qName (elName e) == p

What am I missing?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/af881a3e-4f9a-4644-87b5-aaf93e6e959an%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1626 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: What is the point of isBlockElement in the JATS reader?
       [not found] ` <af881a3e-4f9a-4644-87b5-aaf93e6e959an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2023-05-19  7:50   ` ThomasH
       [not found]     ` <e8dfa46a-1f69-4fa6-a412-6751e2250cacn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: ThomasH @ 2023-05-19  7:50 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1346 bytes --]

It appears that all element types first listed are then exempted again via 
the `inlinetags` list, except the `p` type. This looks indeed like a very 
complicated way to come up with a list of only `p`. - A bug? Or maybe an 
attempt to conform to a common pattern in Pandoc to list candidates first 
and then subtract them again if necessary? I have no knowledge of JATS so I 
cannot comment on semantics. Is there some kind of ambiquity that e.g. the 
`address` element can appear both as a block *and* inline element?

On Friday, May 19, 2023 at 1:47:46 AM UTC+2 Julia Diaz wrote:

> I just realised this function produces TRUE if the name of the element is 
> "p", false otherwise. Why do we need a 126 lines 
> <https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L106-L132> 
> to do the same as: 
>
> qName (elName e) == p
>
> What am I missing?
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e8dfa46a-1f69-4fa6-a412-6751e2250cacn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2685 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: What is the point of isBlockElement in the JATS reader?
       [not found]     ` <e8dfa46a-1f69-4fa6-a412-6751e2250cacn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2023-05-19 12:27       ` Albert Krewinkel
       [not found]         ` <87cz2wo61j.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Albert Krewinkel @ 2023-05-19 12:27 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


ThomasH <therch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> It appears that all element types first listed are then exempted
> again via the `inlinetags` list, except the `p` type. This looks
> indeed like a very complicated way to come up with a list of only `p
> `. - A bug? Or maybe an attempt to conform to a common pattern in
> Pandoc to list candidates first and then subtract them again if
> necessary?

The JATS reader is based on the DocBook reader, AFAIK, and reuses a good
bit of the DocBook code. The list of block tags in the DocBook reader is
much longer, so this is most likely a leftover than could be simplified.


> On Friday, May 19, 2023 at 1:47:46 AM UTC+2 Julia Diaz wrote:
>
>> I just realised this function produces TRUE if the name of the
>> element is "p", false otherwise. Why do we need a 126 lines to do
>> the same as: 
>>
>> qName (elName e) == p
>>
>> What am I missing?


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87cz2wo61j.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: What is the point of isBlockElement in the JATS reader?
       [not found]         ` <87cz2wo61j.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2023-05-19 22:22           ` Julia Diaz
       [not found]             ` <a71b20c8-7a6c-41ac-9af0-141b908111f7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Julia Diaz @ 2023-05-19 22:22 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2781 bytes --]


On Friday, 19 May 2023 at 07:32:51 UTC-5 Albert Krewinkel wrote:

The JATS reader is based on the DocBook reader, AFAIK, and reuses a good 
bit of the DocBook code. The list of block tags in the DocBook reader is 
much longer, so this is most likely a leftover than could be simplified.


Looks like legacy from DocBook indeed. 

I just realised something else: as the JATS reader is written now, the 
isBlockElement never returns TRUE. 
This is because the only function that calls isBlockElement is parseMixed, 
which is only used for the case of "p", which by definition of the JATS 
models cannot contain itself an inner "p" element 
<https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/p.html>. Thus 
the only case that could possibly trigger a TRUE result for isBlockElement 
is impossible.

In other words, as it is written now, not only the isBlockElement is 
pointless, also parseMixed is. Since isBlock is always FALSE, the rest is 
always empty 
<https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L207>, 
and lines 208-211 
<https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L208-L211> 
are never reached. So we could always in all confidence parse the full 
contents of "p" just with parseInLine as done here 
<https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L204>
. 

I would things something got mixed up in the process when the 
isBlockElement was adapted for the JATS reader. I could not help but notice 
that the order of the inLineTags in the isBlockElement function is almost 
identical, and in the exact same order, to the list of allowed contents of 
"p" in the JATS specification 
<https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/p.html> (only 
missing are a few more recent elements, mostly Q&A elements, that 
presumably did not exist when the JATS reader was first written). The 
paragraphLevel list is also an exact copy of the "Paragraph-level Display 
Elements" sublist in the same JATS specification page. It makes no sense to 
me to define these separately only to filter them out immediately and 
inevitably, specially when no record of which list the element in question 
belonged to, and only a context-less Boolean value is ever provided...

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a71b20c8-7a6c-41ac-9af0-141b908111f7n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3528 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: What is the point of isBlockElement in the JATS reader?
       [not found]             ` <a71b20c8-7a6c-41ac-9af0-141b908111f7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2023-05-20 17:00               ` Julia Diaz
  0 siblings, 0 replies; 5+ messages in thread
From: Julia Diaz @ 2023-05-20 17:00 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4205 bytes --]

A further detail: 

Actually, the three lists that describe block elements: 'paragraphLevel', 
'lists', 'mathML', and 'other' are taken directly from the JATS spec of any 
element that can contain a <p> element, all of which share a similar 
structure. For example, the list of allowed contents for element <abstract> 
is defined here 
<https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/abstract.html> 
(click on Models and Context/Description/Any combination of""). It overlaps 
significantly with other elements containing <p>, such as <ack>, 
<glossary>, <note>, etc.

So its seems that "block elements" is intended to mean the common set of 
elements that are allowed in elements that contain a <p>. Then the 'inLine' 
tags are the list of elements that can be contained inside a <p> element. I 
am not sure of the intent or rationale of filtering out the latter from the 
former, but It seems to me that parsedMixed in line 166 
<https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L166> 
was more designed to be applied to containers of <p>, rather than to <p>. 
And that the case "p" should simply do something along the lines of:

"p" -> para $ trimInlines . mconcat <$> mapM parseInline (elContent e) 

A bug?

On Friday, 19 May 2023 at 17:22:29 UTC-5 Julia Diaz wrote:

> On Friday, 19 May 2023 at 07:32:51 UTC-5 Albert Krewinkel wrote:
>
> The JATS reader is based on the DocBook reader, AFAIK, and reuses a good 
> bit of the DocBook code. The list of block tags in the DocBook reader is 
> much longer, so this is most likely a leftover than could be simplified.
>
>
> Looks like legacy from DocBook indeed. 
>
> I just realised something else: as the JATS reader is written now, the 
> isBlockElement never returns TRUE. 
> This is because the only function that calls isBlockElement is parseMixed, 
> which is only used for the case of "p", which by definition of the JATS 
> models cannot contain itself an inner "p" element 
> <https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/p.html>. Thus 
> the only case that could possibly trigger a TRUE result for isBlockElement 
> is impossible.
>
> In other words, as it is written now, not only the isBlockElement is 
> pointless, also parseMixed is. Since isBlock is always FALSE, the rest is 
> always empty 
> <https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L207>, 
> and lines 208-211 
> <https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L208-L211> 
> are never reached. So we could always in all confidence parse the full 
> contents of "p" just with parseInLine as done here 
> <https://github.com/jgm/pandoc/blob/16f28ef5e945f3be14e05afb7d91f8adca18e49a/src/Text/Pandoc/Readers/JATS.hs#L204>
> . 
>
> I would things something got mixed up in the process when the 
> isBlockElement was adapted for the JATS reader. I could not help but notice 
> that the order of the inLineTags in the isBlockElement function is almost 
> identical, and in the exact same order, to the list of allowed contents of 
> "p" in the JATS specification 
> <https://jats.nlm.nih.gov/archiving/tag-library/1.3/element/p.html> (only 
> missing are a few more recent elements, mostly Q&A elements, that 
> presumably did not exist when the JATS reader was first written). The 
> paragraphLevel list is also an exact copy of the "Paragraph-level Display 
> Elements" sublist in the same JATS specification page. It makes no sense 
> to me to define these separately only to filter them out immediately and 
> inevitably, specially when no record of which list the element in question 
> belonged to, and only a context-less Boolean value is ever provided...
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/52eb7338-b175-4241-91cd-6c8ab998f837n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 6660 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-20 17:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 23:47 What is the point of isBlockElement in the JATS reader? Julia Diaz
     [not found] ` <af881a3e-4f9a-4644-87b5-aaf93e6e959an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-05-19  7:50   ` ThomasH
     [not found]     ` <e8dfa46a-1f69-4fa6-a412-6751e2250cacn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-05-19 12:27       ` Albert Krewinkel
     [not found]         ` <87cz2wo61j.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2023-05-19 22:22           ` Julia Diaz
     [not found]             ` <a71b20c8-7a6c-41ac-9af0-141b908111f7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-05-20 17:00               ` Julia Diaz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).