From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32254 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "'Peter Vedal Utnes' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Re: Error caused by document length Date: Tue, 28 Feb 2023 07:03:36 -0800 (PST) Message-ID: <38c57c7c-ec9f-448a-a3e0-47f19d2c7dc3n@googlegroups.com> References: <7ed278f7-071b-4bcc-9f9a-e9dd5c09ee55n@googlegroups.com> <8f11cfaf-7c36-4cc6-9866-aa3741d965a4n@googlegroups.com> <4bd152b5-32f7-4f4c-9a9b-0d20afebea84n@googlegroups.com> <0AFB3E23-B7C1-49E8-9F8A-12716F6A2C40@gmail.com> <20942a45-0995-4a50-888a-cf25e9895920n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_3440_384754120.1677596616032" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1076"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCWNVA7FUIMRBSNP7CPQMGQE5ZPYDOQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Feb 28 16:03:41 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qk1-f186.google.com ([209.85.222.186]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pX1W1-00005G-3Q for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 28 Feb 2023 16:03:41 +0100 Original-Received: by mail-qk1-f186.google.com with SMTP id b22-20020ae9eb16000000b007427f9339c0sf6146122qkg.17 for ; Tue, 28 Feb 2023 07:03:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=9wy3rqtrvQYtgmaHCAEya7Jb3g3gKNBoa5hkBM15Oks=; b=M0KJD96rJ6L9REuged1U6UAclcIB+BXns0X4Sg8uKLj8tPkXOpI+P4+FKKeYHVIocC oISvswjvF+t6YnPwcny3IvB4NMBBr7n5XK84VcP9TJC20jBsicuC+odFPTQ1P+ChXJRN BVi6ozWPbNw8tITNMHGRmGQFwexRZK64Bjj4ljRsgIaPD8d3yUef6TCMHq35z7MxT7lX 4NaV2XzbkYG8CcqzWhMS3cZpo33qeKgnP8s5ty5NK9FqVdywybVSGiUzQDIrvtvf6RZq 5/iLl+X7+eSSyQ8czJZwGOpkoimztaT5THtJ7pZ9AGCd5ZVhueTgnQOuSKHSYAP6uS2Y szpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=9wy3rqtrvQYtgmaHCAEya7Jb3g3gKNBoa5hkBM15Oks=; b=I71nM9TAUITAIC+y5n2JyrYCR1AJXlPFl4tdpTxpZQsRtyy8c8JDKnjFolnUR/lhki dhS2+pSqzLmZBw8SVGxa5GoMntsJ5GVaVdy2K3Gv+l9epV+e9dJm+CIEC3Ws1zTpVMBa AFvLIZzmBoD5lG2/1azQz+yFqMGCqI0dZopguXcbhwXvbTiJcya4aNWTbFFxueW5x4ri C6PPNSRwG9HXHo2FyNhDA9ghK1nmOIgQ2shb+E2ScmxeWkyiyghaYYECPTusze+QO8Qh MnKBLrLR5q1KcU344WB7k/F2Rpl8DXDWaLYWbjMNrGnsqWbnraf224BgJF6dTCqiQpXf ovAg== X-Gm-Message-State: AO0yUKW0FnZb4lb2Y/x47IrMNf2rg+zUGUBXEMeqE1g3KJVRJiyXgxlb 6s3RyHHZIVLBZZ5nThQHq9k= X-Google-Smtp-Source: AK7set98jBeMzk/cQcVre6ViUaT7J+5oBPl0G6ZjNzrQpMNMm7Ep/JilNjfs037O5A1Uk8dZq1oaIQ== X-Received: by 2002:ac8:4085:0:b0:3bf:c8da:291d with SMTP id p5-20020ac84085000000b003bfc8da291dmr873625qtl.0.1677596619979; Tue, 28 Feb 2023 07:03:39 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:ac8:58d6:0:b0:3b9:bfda:1fa8 with SMTP id u22-20020ac858d6000000b003b9bfda1fa8ls10337182qta.3.-pod-prod-gmail; Tue, 28 Feb 2023 07:03:37 -0800 (PST) X-Received: by 2002:aed:27db:0:b0:3bf:da0f:ed90 with SMTP id m27-20020aed27db000000b003bfda0fed90mr785541qtg.3.1677596616688; Tue, 28 Feb 2023 07:03:36 -0800 (PST) In-Reply-To: X-Original-Sender: peter.v.utnes-hYqmg196XYc@public.gmane.org X-Original-From: Peter Vedal Utnes Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32254 Archived-At: ------=_Part_3440_384754120.1677596616032 Content-Type: multipart/alternative; boundary="----=_Part_3441_875588963.1677596616032" ------=_Part_3441_875588963.1677596616032 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I have resolved the issue by sidestepping it with the help of another OJS= =20 plugin that can also show embedded EPUBs (not on the list in OJS, but=20 installed manually from Github, called epubjsviewer). I will include some final remarks on my troubleshoting here, in case=20 someone else searches this or a similar issue. As you can see above, in=20 many of our research papers (formatted with word templates from various=20 journals), Bibi epub viewer refuses to load some of the EPUBs unless you=20 remove or replace certain paragraphs, even though these paragraphs are=20 simply formatted with only

tags, and do not have special unicode=20 characters or (e.g.) Norwegian letters (=C3=A6=C3=B8=C3=A5). Neither is it = related to=20 in-text references, e.g. the --reference-list flag. It fails to load in=20 Opera, Edge and some - but not all - versions of chrome. It is not related= =20 to formatting. The sections of the document that don't work, DO work if=20 inserted into a non-pandoc generated EPUB. So there is some relation=20 between the pandoc EPUB template and combinations of text that Bibi (the=20 commonly used OJS epub viewer) does not tolerate. But it is not the NAV=20 document nor the stylesheet, as I've replaced those to no avail. I have=20 also tried numerous flags, such as --wrap, normalize, TOC levels, section= =20 divisions and so on, and of course all metadata. Further, it is troublesome= =20 to arrive at the specific text by the process of elimination, since there= =20 are evidently multiple sentences that fail.=20 However, since the documents work in other viewers and are fine when=20 debugged with EPUB check or Calibre, I have resorted to a different plugin= =20 for embedded EPUBs. I have learned a lot from the feedback here, and have= =20 improved our pandoc script, EPUB template and troubleshooting procedure.=20 Thanks! Peter mandag 27. februar 2023 kl. 18:10:33 UTC+1 skrev William Lupton: > Maybe this too obvious a comment, but it couldn't be the em-dashes could= =20 > it? Both your sentences below appear to have em-dashes. Try replacing the= m=20 > with hyphens? > > 1) Over years I have experienced much Bronze in the form of articles in= =20 > toll access (TA) journals that have been made freely available for readin= g=20 > =E2=80=93 not open access, but =E2=80=9CFree access=E2=80=9D as some publ= ishers call it. 2) One=20 > thing is to help editors to become aware of the issue, another is to find= =20 > practical solutions for them to transition their scholarly content to OA = =E2=80=93=20 > the rest of their content is really not of interest to us. > > --> > > 1) Over years I have experienced much Bronze in the form of articles in= =20 > toll access (TA) journals that have been made freely available for readin= g=20 > - not open access, but =E2=80=9CFree access=E2=80=9D as some publishers c= all it. 2) One=20 > thing is to help editors to become aware of the issue, another is to find= =20 > practical solutions for them to transition their scholarly content to OA = -=20 > the rest of their content is really not of interest to us. > > On Mon, 27 Feb 2023 at 16:49, 'Peter Vedal Utnes' via pandoc-discuss < > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: > >> When I convert and try to publish a document with only the offending=20 >> sentences, it does indeed fail, Bastien. Even when the document is=20 >> otherwise empty. It is hard to see what might be causing this. I will ha= ve=20 >> to continue the elimination down to the word, but I've been at this for= =20 >> nine hours and it is getting late. Will do that tomorrow. Meanwhile, tha= nks=20 >> for the help, all of you.=20 >> >> >> mandag 27. februar 2023 kl. 17:41:58 UTC+1 skrev Bastien DUMONT: >> >>> If you narrow down the document to the offending sentences (or only one= =20 >>> of them), does bibi fail to read the resulting EPUB? Such minimal sourc= e=20 >>> and EPUB documents would be easier to inspect, and the latter could eve= n be=20 >>> included in a bug report for bibi.=20 >>> >>> Le Monday 27 February 2023 =C3=A0 08:22:34AM, 'Peter Vedal Utnes' via= =20 >>> pandoc-discuss a =C3=A9crit :=20 >>> > I have now done the elimination process, as suggested by Bastien, of= =20 >>> replacing=20 >>> > the working file, which was the EPUB of the research paper where I ha= d=20 >>> swapped=20 >>> > paragraphs 2-10 with "test test test", with the original paragraphs= =20 >>> from the=20 >>> > paper. It worked until I tried to restore a sentence in the middle of= =20 >>> paragraph=20 >>> > 3, going from above, or paragraph 6, going from below. When I insert= =20 >>> the next=20 >>> > sentence in either end, the document fails to convert (in a manner=20 >>> readable by=20 >>> > bibi epub viewer). There does not seem to be unicode characters that= =20 >>> might=20 >>> > interfere. I have ran the debugger you suggest, John ,and there are= =20 >>> indeed=20 >>> > errors (metadata not filled in and a missing tag end) but I fixing=20 >>> these do not=20 >>> > seem to work. =20 >>> >=20 >>> > Here are the seemingly innocuous sentences that fail from above and= =20 >>> below,=20 >>> > respectively: 1) Over years I have experienced much Bronze in the=20 >>> form of=20 >>> > articles in toll access (TA) journals that have been made freely=20 >>> available for=20 >>> > reading =E2=80=93 not open access, but =E2=80=9CFree access=E2=80=9D = as some publishers call=20 >>> it. 2) One=20 >>> > thing is to help editors to become aware of the issue, another is to= =20 >>> find=20 >>> > practical solutions for them to transition their scholarly content to= =20 >>> OA =E2=80=93 the=20 >>> > rest of their content is really not of interest to us.=20 >>> >=20 >>> > There seem to issues with a few other sentences in those 3 paragraphs= =20 >>> too, but=20 >>> > I can't see a pattern. =20 >>> > Here is the article in question, though it is only the PDF galley, my= =20 >>> EPUB=20 >>> > testing is on a private server:=20 >>> https://septentrio.uit.no/index.php/nopos/=20 >>> > article/view/6665=20 >>> >=20 >>> >=20 >>> >=20 >>> > mandag 27. februar 2023 kl. 17:08:31 UTC+1 skrev John MacFarlane:=20 >>> >=20 >>> > You could try running epubcheck on the epub produced by pandoc, to se= e=20 >>> if=20 >>> > it points to anything.=20 >>> >=20 >>> >=20 >>> > > On Feb 27, 2023, at 6:33 AM, 'Peter Vedal Utnes' via pandoc-discuss= =20 >>> <=20 >>> > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:=20 >>> > >=20 >>> > > I just did some further testing, and replaced the sections that I= =20 >>> would=20 >>> > otherwise have removed with as many words and paragraphs, but no=20 >>> signs,=20 >>> > only "test test test" etc. The document then works. So I was wrong=20 >>> about=20 >>> > the length: It must be some character or symbol producing the error= =20 >>> (only=20 >>> > with pandoc, not other EPUB converters). Any idea how to further=20 >>> isolate=20 >>> > it, or how to circumvent with a pandoc command or template?=20 >>> > >=20 >>> > > Thanks for the help so far, Bernardo.=20 >>> > >=20 >>> > >=20 >>> > >=20 >>> > > mandag 27. februar 2023 kl. 15:23:57 UTC+1 skrev Peter Vedal Utnes:= =20 >>> > > I am not sure what you mean by normalize in this context. I'll=20 >>> elaborate=20 >>> > in case this is what you mean: In the interest of removing variables= =20 >>> that=20 >>> > might interfere with troubleshooting, I have copied the text from=20 >>> research=20 >>> > papers (not just one, but a few), pasted it in notepad, copied and=20 >>> pasted=20 >>> > it back into a new word-file (this is more thorough than "clear=20 >>> > formatting"), ran this "pure" file through pandoc and I get the error= .=20 >>> If I=20 >>> > then randomly shorten the file, the error disappears. This is not the= =20 >>> case=20 >>> > for my "test" file, but only for research papers, which is baffling. = I=20 >>> can=20 >>> > only assume that pandoc responds to something like a character or=20 >>> in-text=20 >>> > references in particular contexts, or as was my original hypothesis,= =20 >>> the=20 >>> > number of lines or columns in the EPUB.=20 >>> > >=20 >>> > > mandag 27. februar 2023 kl. 15:17:10 UTC+1 skrev=20 >>> bernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org:=20 >>> > > Have you tried editing the original research paper in some minor wa= y=20 >>> > (adding or removing a couple of characters) and then running it? This= =20 >>> is a=20 >>> > completely wild guess, but maybe the text in the file is getting=20 >>> normalized=20 >>> > upon editing them, whereas the original research paper still contains= =20 >>> the=20 >>> > unedited, unnormalized text.=20 >>> > >=20 >>> > > On Mon, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal Utnes' via=20 >>> pandoc-discuss <=20 >>> > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:=20 >>> > > I thank you for the suggestion. It is proving somewhat hard to (dis= )=20 >>> > confirm. I have made a testfile with just the word "test" pasted over= =20 >>> and=20 >>> > over again, with and without various formatting and with the same=20 >>> length or=20 >>> > longer as the proper papers. This file consistently works. But when I= =20 >>> > attempt to do it with a regular research paper, it only works if I=20 >>> shorten=20 >>> > it. Curiously, I can remove either half of the main text, or indeed= =20 >>> > sections here and there, randomly, and it works, but not with all of= =20 >>> them=20 >>> > present. I have combed it for special characters or tags, but cannot= =20 >>> find=20 >>> > any.=20 >>> > >=20 >>> > > mandag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Bernardo C. D. A.= =20 >>> > Vasconcelos:=20 >>> > > I do not know the answer to this problem in particular, but perhaps= =20 >>> it is=20 >>> > worth checking the main document and the bibliography for invisible= =20 >>> control=20 >>> > characters (e.g. `\X{A0}`). They tend to cause all sorts of strange= =20 >>> > problems that result in random error msgs.=20 >>> > >=20 >>> > > On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM UTC-3 Peter Veda= l Utnes=20 >>> wrote:=20 >>> > > We have a workflow in Open Journal Systems where we use Pandoc to= =20 >>> convert=20 >>> > word documents to EPUB, and then display them with an embedded EPUB= =20 >>> app=20 >>> > (Bibi).=20 >>> > >=20 >>> > > Our resulting EPUBs work fine with both debuggers and viewers like= =20 >>> > calibre. They work in Bibi, but only when they are reduced to a=20 >>> certain=20 >>> > length. Whenever the files exceed approx 100 lines or 600 words, Bibi= =20 >>> > claims:=20 >>> > >=20 >>> > > TypeError: Cannot read properties of undefined (reading=20 >>> =E2=80=98getAttribute=E2=80=99)=20 >>> > >=20 >>> > > Meanwhile, the same documents works when converted to EPUB using=20 >>> other=20 >>> > converters, or when I reduce the length (length, not size in bytes--= =20 >>> I've=20 >>> > tried with graphics, still works). It suddenly works when I reduce th= e=20 >>> > length by removing pure paragraph text, even though all the formatted= =20 >>> > elements (abstract, references, etc) are the same.=20 >>> > >=20 >>> > > I recognize that this problem is very specific to the interrelation= =20 >>> > pandoc <-> Bibi, but I'd be grateful for general troubleshooting=20 >>> > suggestions.=20 >>> > >=20 >>> > > Thanks in advance,=20 >>> > >=20 >>> > > Peter=20 >>> > >=20 >>> > >=20 >>> > > --=20 >>> > > You received this message because you are subscribed to a topic in= =20 >>> the=20 >>> > Google Groups "pandoc-discuss" group.=20 >>> > > To unsubscribe from this topic, visit [1] >>> https://groups.google.com/d/=20 >>> > topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe.=20 >>> > > To unsubscribe from this group and all its topics, send an email to= =20 >>> > pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org=20 >>> > > To view this discussion on the web visit [2] >>> https://groups.google.com/d/=20 >>> > msgid/pandoc-discuss/=20 >>> > 4bd152b5-32f7-4f4c-9a9b-0d20afebea84n%40googlegroups.com.=20 >>> > >=20 >>> > > --=20 >>> > > You received this message because you are subscribed to the Google= =20 >>> Groups=20 >>> > "pandoc-discuss" group.=20 >>> > > To unsubscribe from this group and stop receiving emails from it,= =20 >>> send an=20 >>> > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org=20 >>> > > To view this discussion on the web visit [3] >>> https://groups.google.com/d/=20 >>> > msgid/pandoc-discuss/=20 >>> > bc147d77-69c9-4e5d-82a6-e149f662a823n%40googlegroups.com.=20 >>> >=20 >>> >=20 >>> > --=20 >>> > You received this message because you are subscribed to the Google=20 >>> Groups=20 >>> > "pandoc-discuss" group.=20 >>> > To unsubscribe from this group and stop receiving emails from it, sen= d=20 >>> an email=20 >>> > to [4]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org=20 >>> > To view this discussion on the web visit [5] >>> https://groups.google.com/d/msgid/=20 >>> > pandoc-discuss/20942a45-0995-4a50-888a-cf25e9895920n% >>> 40googlegroups.com.=20 >>> >=20 >>> > References:=20 >>> >=20 >>> > [1]=20 >>> https://groups.google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscrib= e=20 >>> > [2]=20 >>> https://groups.google.com/d/msgid/pandoc-discuss/4bd152b5-32f7-4f4c-9a9= b-0d20afebea84n%40googlegroups.com=20 >>> > [3]=20 >>> https://groups.google.com/d/msgid/pandoc-discuss/bc147d77-69c9-4e5d-82a= 6-e149f662a823n%40googlegroups.com=20 >>> > [4] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org=20 >>> > [5]=20 >>> https://groups.google.com/d/msgid/pandoc-discuss/20942a45-0995-4a50-888= a-cf25e9895920n%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter= =20 >>> >>> --=20 >> You received this message because you are subscribed to the Google Group= s=20 >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n=20 >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To view this discussion on the web visit=20 >> https://groups.google.com/d/msgid/pandoc-discuss/a484697f-9076-4a13-acf1= -a645fa611614n%40googlegroups.com=20 >> >> . >> > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/38c57c7c-ec9f-448a-a3e0-47f19d2c7dc3n%40googlegroups.com. ------=_Part_3441_875588963.1677596616032 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I have resolved the issue by sidestepping it with the help of another OJS p= lugin that can also show embedded EPUBs=C2=A0(not on the list in OJS, but i= nstalled manually from Github, called epubjsviewer).

I will incl= ude some final remarks on my troubleshoting here, in case someone else sear= ches this or a similar issue. As you can see above, in many of our research= papers (formatted with word templates from various journals), Bibi epub vi= ewer refuses to load some of the EPUBs unless you remove or replace certain= paragraphs, even though these paragraphs are simply formatted with only &l= t;p> tags, and do not have special unicode characters or (e.g.) Norwegia= n letters (=C3=A6=C3=B8=C3=A5). Neither is it related to in-text references= , e.g. the --reference-list flag. It fails to load in Opera, Edge and some = - but not all - versions of chrome. It is not related to formatting. The se= ctions of the document that don't work, DO work if inserted into a non-pand= oc generated EPUB. So there is some relation between the pandoc EPUB templa= te and combinations of text that Bibi (the commonly used OJS epub viewer) d= oes not tolerate. But it is not the NAV document nor the stylesheet, as I'v= e replaced those to no avail. I have also tried numerous flags, such as --w= rap, normalize, TOC levels, section divisions and so on, and of course all = metadata. Further, it is troublesome to arrive at the specific text by the = process of elimination, since there are evidently multiple sentences that f= ail.=C2=A0


However, since the documents work in o= ther viewers and are fine when debugged with EPUB check or Calibre, I have = resorted to a different plugin for embedded EPUBs. I have learned a lot fro= m the feedback here, and have improved our pandoc script, EPUB template and= troubleshooting procedure.=C2=A0

Thanks!

=
Peter

mandag 27. februar 2023 kl. 18:10:33 UTC+1 s= krev William Lupton:
Maybe this too obvious a comment, but it couldn= 9;t be the em-dashes could it? Both your sentences below appear to have em-= dashes. Try replacing them with hyphens?

1) Over years I have experienced much Bronze in the form= of articles in toll access (TA) journals that have been made freely availa= ble for reading =E2=80=93 not open access, but =E2=80=9CFree access=E2=80= =9D as some publishers call it.=C2=A02)=C2=A0One thing is= to help editors to become aware of the issue, another is to find practical= solutions for them to transition their scholarly content to OA =E2=80=93 t= he rest of their content is really not of interest to us.
<= div>
--&g= t;

1) Over yea= rs I have experienced much Bronze in the form of articles in toll access (T= A) journals that have been made freely available for reading - not open acc= ess, but =E2=80=9CFree access=E2=80=9D as some publishers call it.= =C2=A02)=C2=A0One thing is to help editors to become aware of th= e issue, another is to find practical solutions for them to transition thei= r scholarly content to OA - the rest of their content is really not of inte= rest to us.

On Mon, 27 Feb 2023 at 16:49, 'Peter Vedal Utnes' via pandoc-di= scuss <pandoc-...@googlegroup= s.com> wrote:
When I convert and try to publish a d= ocument with only the offending sentences, it does indeed fail, Bastien. Ev= en when the document is otherwise empty. It is hard to see what might be ca= using this. I will have to continue the elimination down to the word, but I= 've been at this for nine hours and it is getting late. Will do that to= morrow. Meanwhile, thanks for the help, all of you.=C2=A0


mandag 27. februar 2023 kl. 17:41:58 UTC+1 skrev Bastien DUMONT:
If you narrow down the = document to the offending sentences (or only one of them), does bibi fail t= o read the resulting EPUB? Such minimal source and EPUB documents would be = easier to inspect, and the latter could even be included in a bug report fo= r bibi.

Le Monday 27 February 2023 =C3=A0 08:22:34AM, 'Peter Vedal Utnes= 9; via pandoc-discuss a =C3=A9crit :
> I have now done the elimination process, as suggested by Bastien, = of replacing
> the working file, which was the EPUB of the research paper where I= had swapped
> paragraphs 2-10 with "test test test", with the original= paragraphs from the
> paper. It worked until I tried to restore a sentence in the middle= of paragraph
> 3, going from above, or paragraph 6, going from below. When I inse= rt the next
> sentence in either end, the document fails to convert (in a manner= readable by
> bibi epub viewer). There does not seem to be unicode characters th= at might
> interfere. I have ran the debugger you suggest, John ,and there ar= e indeed
> errors (metadata not filled in and a missing tag end) but I fixing= these do not
> seem to work.=C2=A0
>=20
> Here are the seemingly innocuous sentences that fail from above an= d below,
> respectively: 1)=C2=A0=C2=A0Over years I have experienced much Bro= nze in the form of
> articles in toll access (TA) journals that have been made freely a= vailable for
> reading =E2=80=93 not open access, but =E2=80=9CFree access=E2=80= =9D as some publishers call it. 2) One
> thing is to help editors to become aware of the issue, another is = to find
> practical solutions for them to transition their scholarly content= to OA =E2=80=93 the
> rest of their content is really not of interest to us.
>=20
> There seem to issues with a few other sentences in those 3 paragra= phs too, but
> I can't see a pattern.=C2=A0
> Here is the article in question, though it is only the PDF galley,= my EPUB
> testing is on a private server:=C2=A0https://septentrio.uit.no/index.php/nopos/=
> article/view/6665
>=20
>=20
>=20
> mandag 27. februar 2023 kl. 17:08:31 UTC+1 skrev John MacFarlane:
>=20
> You could try running epubcheck on the epub produced by pandoc= , to see if
> it points to anything.
>=20
>=20
> > On Feb 27, 2023, at 6:33 AM, 'Peter Vedal Utnes' = via pandoc-discuss <
> pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:
> >
> > I just did some further testing, and replaced the section= s that I would
> otherwise have removed with as many words and paragraphs, but = no signs,
> only "test test test" etc. The document then works. = So I was wrong about
> the length: It must be some character or symbol producing the = error (only
> with pandoc, not other EPUB converters). Any idea how to furth= er isolate
> it, or how to circumvent with a pandoc command or template?
> >
> > Thanks for the help so far, Bernardo.
> >
> >
> >
> > mandag 27. februar 2023 kl. 15:23:57 UTC+1 skrev Peter Ve= dal Utnes:
> > I am not sure what you mean by normalize in this context.= I'll elaborate
> in case this is what you mean: In the interest of removing var= iables that
> might interfere with troubleshooting, I have copied the text f= rom research
> papers (not just one, but a few), pasted it in notepad, copied= and pasted
> it back into a new word-file (this is more thorough than "= ;clear
> formatting"), ran this "pure" file through pand= oc and I get the error. If I
> then randomly shorten the file, the error disappears. This is = not the case
> for my "test" file, but only for research papers, wh= ich is baffling. I can
> only assume that pandoc responds to something like a character= or in-text
> references in particular contexts, or as was my original hypot= hesis, the
> number of lines or columns in the EPUB.
> >
> > mandag 27. februar 2023 kl. 15:17:10 UTC+1 skrev bernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org:
> > Have you tried editing the original research paper in som= e minor way
> (adding or removing a couple of characters) and then running i= t? This is a
> completely wild guess, but maybe the text in the file is getti= ng normalized
> upon editing them, whereas the original research paper still c= ontains the
> unedited, unnormalized text.
> >
> > On Mon, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal= Utnes' via pandoc-discuss <
> pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:
> > I thank you for the suggestion. It is proving somewhat ha= rd to (dis)
> confirm. I have made a testfile with just the word "test&= quot; pasted over and
> over again, with and without various formatting and with the s= ame length or
> longer as the proper papers. This file consistently works. But= when I
> attempt to do it with a regular research paper, it only works = if I shorten
> it. Curiously, I can remove either half of the main text, or i= ndeed
> sections here and there, randomly, and it works, but not with = all of them
> present. I have combed it for special characters or tags, but = cannot find
> any.
> >
> > mandag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Bernardo= C. D. A.
> Vasconcelos:
> > I do not know the answer to this problem in particular, b= ut perhaps it is
> worth checking the main document and the bibliography for invi= sible control
> characters (e.g. `\X{A0}`). They tend to cause all sorts of st= range
> problems that result in random error msgs.
> >
> > On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM UTC-3 = Peter Vedal Utnes wrote:
> > We have a workflow in Open Journal Systems where we use P= andoc to convert
> word documents to EPUB, and then display them with an embedded= EPUB app
> (Bibi).
> >
> > Our resulting EPUBs work fine with both debuggers and vie= wers like
> calibre. They work in Bibi, but only when they are reduced to = a certain
> length. Whenever the files exceed approx 100 lines or 600 word= s, Bibi
> claims:
> >
> > TypeError: Cannot read properties of undefined (reading = =E2=80=98getAttribute=E2=80=99)
> >
> > Meanwhile, the same documents works when converted to EPU= B using other
> converters, or when I reduce the length (length, not size in b= ytes-- I've
> tried with graphics, still works). It suddenly works when I re= duce the
> length by removing pure paragraph text, even though all the fo= rmatted
> elements (abstract, references, etc) are the same.
> >
> > I recognize that this problem is very specific to the int= errelation
> pandoc <-> Bibi, but I'd be grateful for general tro= ubleshooting
> suggestions.
> >
> > Thanks in advance,
> >
> > Peter
> >
> >
> > --
> > You received this message because you are subscribed to a= topic in the
> Google Groups "pandoc-discuss" group.
> > To unsubscribe from this topic, visit [1]https://groups.google.com/d/
> topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe.
> > To unsubscribe from this group and all its topics, send a= n email to=20
> pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> > To view this discussion on the web visit [2]https://groups.google.com/d/
> msgid/pandoc-discuss/
> 4bd152b5-32f7-4f4c-9a9b-0d20afebea84n%40googlegroups.com.
> >
> > --
> > You received this message because you are subscribed to t= he Google Groups
> "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails = from it, send an
> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org= .
> > To view this discussion on the web visit [3]https://groups.google.com/d/
> msgid/pandoc-discuss/
> bc147d77-69c9-4e5d-82a6-e149f662a823n%40googlegroups.com.
>=20
>=20
> --
> You received this message because you are subscribed to the Google= Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email
> to [4]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> To view this discussion on the web visit [5]https://groups.google.com/d/msgid/
> pandoc-discuss/20942a45-0995-4a50-888a-cf25e9895920n%40googlegroups.com.
>=20
> References:
>=20
> [1] https://groups.= google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe
> [2] https://groups.google.c= om/d/msgid/pandoc-discuss/4bd152b5-32f7-4f4c-9a9b-0d20afebea84n%40googlegro= ups.com
> [3] https://groups.google.c= om/d/msgid/pandoc-discuss/bc147d77-69c9-4e5d-82a6-e149f662a823n%40googlegro= ups.com
> [4] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [5]
https://groups= .google.com/d/msgid/pandoc-discuss/20942a45-0995-4a50-888a-cf25e9895920n%40= googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus...@googleg= roups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/38c57c7c-ec9f-448a-a3e0-47f19d2c7dc3n%40googlegroups.= com.
------=_Part_3441_875588963.1677596616032-- ------=_Part_3440_384754120.1677596616032--