From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32665 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Julia Diaz Newsgroups: gmane.text.pandoc Subject: Re: What is the point of isBlockElement in the JATS reader? Date: Sat, 20 May 2023 10:00:39 -0700 (PDT) Message-ID: <52eb7338-b175-4241-91cd-6c8ab998f837n@googlegroups.com> References: <87cz2wo61j.fsf@zeitkraut.de> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_258_1432766451.1684602039875" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19239"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDGPDMUERAEBBOPZUORQMGQEXURWNZQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat May 20 19:00:46 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ot1-f59.google.com ([209.85.210.59]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1q0Pwi-0004q0-Hq for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 20 May 2023 19:00:44 +0200 Original-Received: by mail-ot1-f59.google.com with SMTP id 46e09a7af769-6ab85db2ea4sf3659491a34.0 for ; Sat, 20 May 2023 10:00:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20221208; t=1684602043; x=1687194043; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=5p04w7bnM3W3lbQtbTzFq53jXv1UkBLB4Uc5T0uXSSI=; b=WroA5BbXVXd+D2IW1phqHGZs3kiuS1IZftFkWuDaJmRQZXcwDNVos+s3Rf6vqYOwB3 C5RgkhBPtw2mlH4adt2H6kQOLdefoPfxBQDFjhvKfhderHIFeIzYwpNKBriTi8cOVHWU GxvOPB8daHRpsvqPDsmvgWIWD9dDw5ewcWsvgrnEtxWt09EUimuvrEva02jOoi2ypFua uQmhMyLRUTd7+CIlw18OqXpI4FFIpgAthPzGMVaw+gdhdeqnNHk+3pqUW4A//I0/I7vt 1CTXGSmaJhZ5M3FWkNpslkWcdrWP6pVDTG8aXIKvtk7chSTOyAfbkN7KyHuvL/2RLWmU 7X3A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684602043; x=1687194043; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=5p04w7bnM3W3lbQtbTzFq53jXv1UkBLB4Uc5T0uXSSI=; b=TbAZD3CMyXohOOzPOMfpW5ZRdGtI4uA6OmZhngkceWCYGY6e00paxvevvI5nje4Uoe /aJdvhV+wMggJ9Ao3/NZKTgCyzW+Hcgp0OXMYdmuPLd/oW0Zg3+CHZ/9KRDWqamKIiLP Hh+Exx4vyvRlRpW26TsYtv44UF/pJinF8/PpQSo1bDVS76oAChPx8op0P0NsfWkh7t+5 UmjvorWKXOgqNkkpyIdIV45cogU1fzfWO7LjOpTh7MQX7tyIidEfAKjfpHkTHWGV5Tes k33uKF6kVBzEsXlx2lB2QvBOTVE18K5mT/30aOlQT5y+mfB43OBtbGZXHhVO1/5xxgqF uX6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684602043; x=1687194043; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-beenthere:x-gm-message-state:sender:from :to:cc:subject:date:message-id:reply-to; bh=5p04w7bnM3W3lbQtbTzFq53jXv1UkBLB4Uc5T0uXSSI=; b=j+RH1Z+SQyoTRVKXLgX4bKEWh8pRL03rYVd197prCsX3KaiuBAblO94bAI3qcKHomB jvGVe6qDWoWBsHxF/C+JBrIjEMfMqb9hCGKK4l/9/FIMqNenXCQhPeMvQPPX7V8PklqB VH0sP/12uFZrIp0boVbSub742UpRap6lt8e6/XklLQ3Xhu0craknk+8pYIyBTPfzbztG lFeMi0fmS/dwD6UMze3JM42PbIy44lNF2nwNdbnSYzcm7uYMk+khn3twjeKksH5Cj+Ah fKPKA1VpdmFVzaB06GLu+18u/+juT6E9qbJSNAZSpUW2wxmLjX Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AC+VfDyEb+r76KJ9HCh08MREoFWKHBTVLGIVnjrXeMgsJn7+h4YAxstk c/L2o0hbYO1+edhNCma496c= X-Google-Smtp-Source: ACHHUZ5tlneoVORSm/okwCliIP3gbg6yef9K8TfNrl6rBDvs5hu1qdgFF92xebGrIfhxxMAl2yXoWQ== X-Received: by 2002:a9d:73ce:0:b0:6ac:5e63:cedb with SMTP id m14-20020a9d73ce000000b006ac5e63cedbmr1468645otk.0.1684602043306; Sat, 20 May 2023 10:00:43 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a4a:d0d0:0:b0:552:6052:b6fa with SMTP id u16-20020a4ad0d0000000b005526052b6fals3510821oor.0.-pod-prod-09-us; Sat, 20 May 2023 10:00:40 -0700 (PDT) X-Received: by 2002:a05:6820:1ac1:b0:546:f7dd:69c7 with SMTP id bu1-20020a0568201ac100b00546f7dd69c7mr1979047oob.0.1684602040562; Sat, 20 May 2023 10:00:40 -0700 (PDT) In-Reply-To: X-Original-Sender: julia.diaz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32665 Archived-At: ------=_Part_258_1432766451.1684602039875 Content-Type: multipart/alternative; boundary="----=_Part_259_471620526.1684602039875" ------=_Part_259_471620526.1684602039875 Content-Type: text/plain; charset="UTF-8" A further detail: Actually, the three lists that describe block elements: 'paragraphLevel', 'lists', 'mathML', and 'other' are taken directly from the JATS spec of any element that can contain a

element, all of which share a similar structure. For example, the list of allowed contents for element is defined here (click on Models and Context/Description/Any combination of""). It overlaps significantly with other elements containing

, such as , , , etc. So its seems that "block elements" is intended to mean the common set of elements that are allowed in elements that contain a

. Then the 'inLine' tags are the list of elements that can be contained inside a

element. I am not sure of the intent or rationale of filtering out the latter from the former, but It seems to me that parsedMixed in line 166 was more designed to be applied to containers of

, rather than to

. And that the case "p" should simply do something along the lines of: "p" -> para $ trimInlines . mconcat <$> mapM parseInline (elContent e) A bug? On Friday, 19 May 2023 at 17:22:29 UTC-5 Julia Diaz wrote: > On Friday, 19 May 2023 at 07:32:51 UTC-5 Albert Krewinkel wrote: > > The JATS reader is based on the DocBook reader, AFAIK, and reuses a good > bit of the DocBook code. The list of block tags in the DocBook reader is > much longer, so this is most likely a leftover than could be simplified. > > > Looks like legacy from DocBook indeed. > > I just realised something else: as the JATS reader is written now, the > isBlockElement never returns TRUE. > This is because the only function that calls isBlockElement is parseMixed, > which is only used for the case of "p", which by definition of the JATS > models cannot contain itself an inner "p" element > . Thus > the only case that could possibly trigger a TRUE result for isBlockElement > is impossible. > > In other words, as it is written now, not only the isBlockElement is > pointless, also parseMixed is. Since isBlock is always FALSE, the rest is > always empty > , > and lines 208-211 > > are never reached. So we could always in all confidence parse the full > contents of "p" just with parseInLine as done here > > . > > I would things something got mixed up in the process when the > isBlockElement was adapted for the JATS reader. I could not help but notice > that the order of the inLineTags in the isBlockElement function is almost > identical, and in the exact same order, to the list of allowed contents of > "p" in the JATS specification > (only > missing are a few more recent elements, mostly Q&A elements, that > presumably did not exist when the JATS reader was first written). The > paragraphLevel list is also an exact copy of the "Paragraph-level Display > Elements" sublist in the same JATS specification page. It makes no sense > to me to define these separately only to filter them out immediately and > inevitably, specially when no record of which list the element in question > belonged to, and only a context-less Boolean value is ever provided... > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/52eb7338-b175-4241-91cd-6c8ab998f837n%40googlegroups.com. ------=_Part_259_471620526.1684602039875 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable A further detail:=C2=A0


Actually, the three lists that= describe block elements: 'paragraphLevel', 'lists', 'mathML', and 'other' = are taken directly from the JATS spec of any element that can contain a <= ;p> element, all of which share a similar structure. For example, the li= st of allowed contents for element <abstract> is defined he= re (click on Models and Context/Description/Any combination of""). It o= verlaps significantly with other elements containing <p>, such as <= ;ack>, <glossary>, <note>, etc.

S= o its seems that "block elements" is intended to mean the common set of ele= ments that are allowed in elements that contain a <p>. Then the 'inLi= ne' tags are the list of elements that can be contained inside a <p> = element. I am not sure of the intent or rationale of filtering out the latt= er from the former, but It seems to me that parsedMixed in line 166 was more designed to be appl= ied to containers of <p>, rather than to <p>. And that the case= "p" should simply do something along the lines of:

<= div>"p" -> para $ trimInlines . mconcat <$> mapM parseInline (elCo= ntent e)=C2=A0

A bug?

On Friday, 19= May 2023 at 17:22:29 UTC-5 Julia Diaz wrote:
On Friday, 19 May= 2023 at 07:32:51 UTC-5 Albert Krewinkel wrote:
The JATS reader is based on the DocBook reader, AFAIK, and reuse= s a good
bit of the DocBook code. The list of block tags in the DocBook reader i= s
much longer, so this is most likely a leftover than could be simplified= .

Looks like legacy from Doc= Book indeed.=C2=A0

I just realised something else:= as the JATS reader is written now, the isBlockElement never returns TRUE.= =C2=A0
This is because the only function that calls isBlockElemen= t is parseMixed, which is only used for the case of "p", which by definition of the JATS models cannot co= ntain itself an inner "p" element. Thus the only case that co= uld possibly trigger a TRUE result for isBlockElement is impossible.
<= div>
In other words, as it is written now, not only the isBlo= ckElement is pointless, also parseMixed is. Since isBlock is always FALSE, = the rest is always = empty, and lines 208-211 are never reached. So we could always in all confid= ence parse the full contents of "p" just with parseInLine as done= =C2=A0here.=C2= =A0

I would things something got mixed up in the p= rocess when the isBlockElement was adapted for the JATS reader. I could not= help but notice that the order of the inLineTags in the isBlockElement fun= ction is almost identical, and in the exact same order, to the list of allo= wed contents of "p" in the JAT= S specification=C2=A0(only missing are a few more recent elements, most= ly Q&A elements, that presumably did not exist when the JATS reader was= first written). The paragraphLevel list is also an exact copy of the "= ;Paragraph-level Display Elements&= quot; sublist in the same JATS specification page. It makes no sense to me = to define these separately only to filter them out immediately and inevitab= ly, specially when no record of which list the element in question belonged= to, and only a context-less Boolean value is ever provided...
<= /blockquote>

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/52eb7338-b175-4241-91cd-6c8ab998f837n%40googlegroups.= com.
------=_Part_259_471620526.1684602039875-- ------=_Part_258_1432766451.1684602039875--