From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32246 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "'Peter Vedal Utnes' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Re: Error caused by document length Date: Mon, 27 Feb 2023 08:22:34 -0800 (PST) Message-ID: <20942a45-0995-4a50-888a-cf25e9895920n@googlegroups.com> References: <7ed278f7-071b-4bcc-9f9a-e9dd5c09ee55n@googlegroups.com> <8f11cfaf-7c36-4cc6-9866-aa3741d965a4n@googlegroups.com> <4bd152b5-32f7-4f4c-9a9b-0d20afebea84n@googlegroups.com> <0AFB3E23-B7C1-49E8-9F8A-12716F6A2C40@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_6006_1601726939.1677514954819" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12419"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCWNVA7FUIMRBTFR6OPQMGQENNITTQY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Feb 27 17:22:39 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qk1-f185.google.com ([209.85.222.185]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pWgGt-00030y-87 for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 27 Feb 2023 17:22:39 +0100 Original-Received: by mail-qk1-f185.google.com with SMTP id d4-20020a05620a166400b00742859d0d4fsf4047278qko.15 for ; Mon, 27 Feb 2023 08:22:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=hYbTjLzj8xzHjzQnJabLIs9YP5VsSHvp6X8szfuBJSM=; b=Q7k1X3JMjlN1ffFztb5MUfTWsqzvgmWrwbfUWb6IMdHvvhwcB4Vu1G5ubjiE8EkQUJ ZBj2oqcXg4QJWzUBrNtnSIFz/+ixWEzhUmEBz5JHBwsYRdc4TFgrJhID7h/0oWz+5oF+ jcD/2eZra/EET1ISJxAJ2NCNaxxftznMhDCVS/0awbLHZw0WA2+X7NbbvWiCX4joFiB8 xbpcdUztYsf51DBT5VOQ5FGmP4nvxU58NTzgI3XGN7z7SigHvpcZ4ua88dEheID3x7XD F2f9qyyifWq8ANuMHDTtwTDo01UGzFsN252Ivw2LTsZCziStNcmrfsLQmXpq9Sfm2RlM 0rGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hYbTjLzj8xzHjzQnJabLIs9YP5VsSHvp6X8szfuBJSM=; b=2M8uUSeFcCm2J1Jfc6vDMX6Wbg9jnq/ETe4wZMAGkgiXIPjGPDkJ1leTxxd3hHPgWR x8P6Bk6cfPunC/18ADWPfyluy0zd8f3Ur565D/Rx3xIxc3pQNXZvSjl4QxNcsONOyMWU 8lBQuVzLkoc6j/KW6XYya0qEGmnBlPYZNWd33ep+vH2jLwOZQhEj0s36z+iZEgOWtLAA t+9oDR41tZwU2Y87IF/0hR5Ujqb2vVuGFaBEcjZ5hhi4IvxenE2glH7enRMdebE0FefX SG8Fn+V/168duGppyh+6A+cnx+uNggSWSbe4ZncjywMqvzRSIFCWCTESB/611yX+Gwof 6AyA== X-Gm-Message-State: AO0yUKV92d+sh2asY2Tudx2D1hHdB9jmnfEyED5LOztMYLzaVN3hIwqE aoKE+lysaCbkYuVaag8RLTM= X-Google-Smtp-Source: AK7set8L5VJfhc9hWzvW5a0OkTEVMYwoO5T3zld6mCrN1yFP+ldAcoqGLsX4EmFccQLW9HSvScedpA== X-Received: by 2002:a05:620a:4447:b0:71f:b8f8:f3db with SMTP id w7-20020a05620a444700b0071fb8f8f3dbmr21727qkp.1.1677514958046; Mon, 27 Feb 2023 08:22:38 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:ac8:4d8e:0:b0:3bd:1b92:edd1 with SMTP id a14-20020ac84d8e000000b003bd1b92edd1ls10491292qtw.9.-pod-prod-gmail; Mon, 27 Feb 2023 08:22:35 -0800 (PST) X-Received: by 2002:ac8:65cc:0:b0:3bc:f00b:931f with SMTP id t12-20020ac865cc000000b003bcf00b931fmr5072577qto.10.1677514955417; Mon, 27 Feb 2023 08:22:35 -0800 (PST) In-Reply-To: <0AFB3E23-B7C1-49E8-9F8A-12716F6A2C40-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> X-Original-Sender: peter.v.utnes-hYqmg196XYc@public.gmane.org X-Original-From: Peter Vedal Utnes Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32246 Archived-At: ------=_Part_6006_1601726939.1677514954819 Content-Type: multipart/alternative; boundary="----=_Part_6007_39931134.1677514954819" ------=_Part_6007_39931134.1677514954819 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I have now done the elimination process, as suggested by Bastien, of=20 replacing the working file, which was the EPUB of the research paper where= =20 I had swapped paragraphs 2-10 with "test test test", with the original=20 paragraphs from the paper. It worked until I tried to restore a sentence in= =20 the middle of paragraph 3, going from above, or paragraph 6, going from=20 below. When I insert the next sentence in either end, the document fails to= =20 convert (in a manner readable by bibi epub viewer). There does not seem to= =20 be unicode characters that might interfere. I have ran the debugger you=20 suggest, John ,and there are indeed errors (metadata not filled in and a=20 missing tag end) but I fixing these do not seem to work.=20 Here are the seemingly innocuous sentences that fail from above and below,= =20 respectively: 1) Over years I have experienced much Bronze in the form of= =20 articles in toll access (TA) journals that have been made freely available= =20 for reading =E2=80=93 not open access, but =E2=80=9CFree access=E2=80=9D as= some publishers call it.=20 2) One thing is to help editors to become aware of the issue, another is to= =20 find practical solutions for them to transition their scholarly content to= =20 OA =E2=80=93 the rest of their content is really not of interest to us. There seem to issues with a few other sentences in those 3 paragraphs too,= =20 but I can't see a pattern.=20 Here is the article in question, though it is only the PDF galley, my EPUB= =20 testing is on a private server:=20 https://septentrio.uit.no/index.php/nopos/article/view/6665 mandag 27. februar 2023 kl. 17:08:31 UTC+1 skrev John MacFarlane: > You could try running epubcheck on the epub produced by pandoc, to see if= =20 > it points to anything. > > > > On Feb 27, 2023, at 6:33 AM, 'Peter Vedal Utnes' via pandoc-discuss < > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: > >=20 > > I just did some further testing, and replaced the sections that I would= =20 > otherwise have removed with as many words and paragraphs, but no signs,= =20 > only "test test test" etc. The document then works. So I was wrong about= =20 > the length: It must be some character or symbol producing the error (only= =20 > with pandoc, not other EPUB converters). Any idea how to further isolate= =20 > it, or how to circumvent with a pandoc command or template? > >=20 > > Thanks for the help so far, Bernardo. > >=20 > >=20 > >=20 > > mandag 27. februar 2023 kl. 15:23:57 UTC+1 skrev Peter Vedal Utnes: > > I am not sure what you mean by normalize in this context. I'll elaborat= e=20 > in case this is what you mean: In the interest of removing variables that= =20 > might interfere with troubleshooting, I have copied the text from researc= h=20 > papers (not just one, but a few), pasted it in notepad, copied and pasted= =20 > it back into a new word-file (this is more thorough than "clear=20 > formatting"), ran this "pure" file through pandoc and I get the error. If= I=20 > then randomly shorten the file, the error disappears. This is not the cas= e=20 > for my "test" file, but only for research papers, which is baffling. I ca= n=20 > only assume that pandoc responds to something like a character or in-text= =20 > references in particular contexts, or as was my original hypothesis, the= =20 > number of lines or columns in the EPUB.=20 > >=20 > > mandag 27. februar 2023 kl. 15:17:10 UTC+1 skrev bernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org= : > > Have you tried editing the original research paper in some minor way=20 > (adding or removing a couple of characters) and then running it? This is = a=20 > completely wild guess, but maybe the text in the file is getting normaliz= ed=20 > upon editing them, whereas the original research paper still contains the= =20 > unedited, unnormalized text. > >=20 > > On Mon, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal Utnes' via pandoc= -discuss < > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: > > I thank you for the suggestion. It is proving somewhat hard to=20 > (dis)confirm. I have made a testfile with just the word "test" pasted ove= r=20 > and over again, with and without various formatting and with the same=20 > length or longer as the proper papers. This file consistently works. But= =20 > when I attempt to do it with a regular research paper, it only works if I= =20 > shorten it. Curiously, I can remove either half of the main text, or inde= ed=20 > sections here and there, randomly, and it works, but not with all of them= =20 > present. I have combed it for special characters or tags, but cannot find= =20 > any.=20 > >=20 > > mandag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Bernardo C. D. A.=20 > Vasconcelos: > > I do not know the answer to this problem in particular, but perhaps it= =20 > is worth checking the main document and the bibliography for invisible=20 > control characters (e.g. `\X{A0}`). They tend to cause all sorts of stran= ge=20 > problems that result in random error msgs. > >=20 > > On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM UTC-3 Peter Vedal Ut= nes wrote: > > We have a workflow in Open Journal Systems where we use Pandoc to=20 > convert word documents to EPUB, and then display them with an embedded EP= UB=20 > app (Bibi).=20 > >=20 > > Our resulting EPUBs work fine with both debuggers and viewers like=20 > calibre. They work in Bibi, but only when they are reduced to a certain= =20 > length. Whenever the files exceed approx 100 lines or 600 words, Bibi=20 > claims: > >=20 > > TypeError: Cannot read properties of undefined (reading =E2=80=98getAtt= ribute=E2=80=99) > >=20 > > Meanwhile, the same documents works when converted to EPUB using other= =20 > converters, or when I reduce the length (length, not size in bytes-- I've= =20 > tried with graphics, still works). It suddenly works when I reduce the=20 > length by removing pure paragraph text, even though all the formatted=20 > elements (abstract, references, etc) are the same.=20 > >=20 > > I recognize that this problem is very specific to the interrelation=20 > pandoc <-> Bibi, but I'd be grateful for general troubleshooting=20 > suggestions.=20 > >=20 > > Thanks in advance,=20 > >=20 > > Peter > >=20 > >=20 > > --=20 > > You received this message because you are subscribed to a topic in the= =20 > Google Groups "pandoc-discuss" group. > > To unsubscribe from this topic, visit=20 > https://groups.google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe. > > To unsubscribe from this group and all its topics, send an email to=20 > pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/4bd152b5-32f7-4f4c-9a9b-= 0d20afebea84n%40googlegroups.com > . > >=20 > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/bc147d77-69c9-4e5d-82a6-= e149f662a823n%40googlegroups.com > . > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/20942a45-0995-4a50-888a-cf25e9895920n%40googlegroups.com. ------=_Part_6007_39931134.1677514954819 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I have now done the elimination process, as suggested by Bastien, of replac= ing the working file, which was the EPUB of the research paper where I had = swapped paragraphs 2-10 with "test test test", with the original paragraphs= from the paper. It worked until I tried to restore a sentence in the middl= e of paragraph 3, going from above, or paragraph 6, going from below. When = I insert the next sentence in either end, the document fails to convert (in= a manner readable by bibi epub viewer). There does not seem to be unicode = characters that might interfere. I have ran the debugger you suggest, John = ,and there are indeed errors (metadata not filled in and a missing tag end)= but I fixing these do not seem to work.=C2=A0

Here ar= e the seemingly innocuous sentences that fail from above and below, respect= ively: 1)=C2=A0=C2=A0Over years I have experienced much B= ronze in the form of articles in toll access (TA) journals that have been made freely available for reading =E2=80=93 not open access, but =E2=80=9CFree access= =E2=80=9D as some publishers call it. 2) One thing is to help editors to become= aware of the issue, another is to find practical solutions for them to transition th= eir scholarly content to OA =E2=80=93 the rest of their content is really not o= f interest to us.

There seem to issues wit= h a few other sentences in those 3 paragraphs too, but I can't see a patter= n.=C2=A0
Here is the article in question, though it i= s only the PDF galley, my EPUB testing is on a private server:=C2=A0= https://septentrio.uit.no/index.php/nopos/article/view/6665



mandag 27. februar 2023 kl. 17:08:3= 1 UTC+1 skrev John MacFarlane:
You could try running epubcheck on the epub produced by p= andoc, to see if it points to anything.


> On Feb 27, 2023, at 6:33 AM, 'Peter Vedal Utnes' via pando= c-discuss <pandoc-...@googleg= roups.com> wrote:
>=20
> I just did some further testing, and replaced the sections that I = would otherwise have removed with as many words and paragraphs, but no sign= s, only "test test test" etc. The document then works. So I was w= rong about the length: It must be some character or symbol producing the er= ror (only with pandoc, not other EPUB converters). Any idea how to further = isolate it, or how to circumvent with a pandoc command or template?
>=20
> Thanks for the help so far, Bernardo.
>=20
>=20
>=20
> mandag 27. februar 2023 kl. 15:23:57 UTC+1 skrev Peter Vedal Utnes= :
> I am not sure what you mean by normalize in this context. I'll= elaborate in case this is what you mean: In the interest of removing varia= bles that might interfere with troubleshooting, I have copied the text from= research papers (not just one, but a few), pasted it in notepad, copied an= d pasted it back into a new word-file (this is more thorough than "cle= ar formatting"), ran this "pure" file through pandoc and I g= et the error. If I then randomly shorten the file, the error disappears. Th= is is not the case for my "test" file, but only for research pape= rs, which is baffling. I can only assume that pandoc responds to something = like a character or in-text references in particular contexts, or as was my= original hypothesis, the number of lines or columns in the EPUB.=20
>=20
> mandag 27. februar 2023 kl. 15:17:10 UTC+1 skrev bernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org:
> Have you tried editing the original research paper in some minor w= ay (adding or removing a couple of characters) and then running it? This is= a completely wild guess, but maybe the text in the file is getting normali= zed upon editing them, whereas the original research paper still contains t= he unedited, unnormalized text.
>=20
> On Mon, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal Utnes= 9; via pandoc-discuss <pandoc= -...@googlegroups.com> wrote:
> I thank you for the suggestion. It is proving somewhat hard to (di= s)confirm. I have made a testfile with just the word "test" paste= d over and over again, with and without various formatting and with the sam= e length or longer as the proper papers. This file consistently works. But = when I attempt to do it with a regular research paper, it only works if I s= horten it. Curiously, I can remove either half of the main text, or indeed = sections here and there, randomly, and it works, but not with all of them p= resent. I have combed it for special characters or tags, but cannot find an= y.=20
>=20
> mandag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Bernardo C. D. A.= Vasconcelos:
> I do not know the answer to this problem in particular, but perhap= s it is worth checking the main document and the bibliography for invisible= control characters (e.g. `\X{A0}`). They tend to cause all sorts of strang= e problems that result in random error msgs.
>=20
> On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM UTC-3 Peter Ved= al Utnes wrote:
> We have a workflow in Open Journal Systems where we use Pandoc to = convert word documents to EPUB, and then display them with an embedded EPUB= app (Bibi).=20
>=20
> Our resulting EPUBs work fine with both debuggers and viewers like= calibre. They work in Bibi, but only when they are reduced to a certain le= ngth. Whenever the files exceed approx 100 lines or 600 words, Bibi claims:
>=20
> TypeError: Cannot read properties of undefined (reading =E2=80=98g= etAttribute=E2=80=99)
>=20
> Meanwhile, the same documents works when converted to EPUB using o= ther converters, or when I reduce the length (length, not size in bytes-- I= 've tried with graphics, still works). It suddenly works when I reduce = the length by removing pure paragraph text, even though all the formatted e= lements (abstract, references, etc) are the same.=20
>=20
> I recognize that this problem is very specific to the interrelatio= n pandoc <-> Bibi, but I'd be grateful for general troubleshootin= g suggestions.=20
>=20
> Thanks in advance,=20
>=20
> Peter
>=20
>=20
> --=20
> You received this message because you are subscribed to a topic in= the Google Groups "pandoc-discuss" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/pandoc-discuss/hPUa1uWG= S_k/unsubscribe.
> To unsubscribe from this group and all its topics, send an email t= o pandoc-discus...@googlegroups.= com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4bd152b5-3= 2f7-4f4c-9a9b-0d20afebea84n%40googlegroups.com.
>=20
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-discus..= .@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/bc147d77-6= 9c9-4e5d-82a6-e149f662a823n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/20942a45-0995-4a50-888a-cf25e9895920n%40googlegroups.= com.
------=_Part_6007_39931134.1677514954819-- ------=_Part_6006_1601726939.1677514954819--