From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32242 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "'Peter Vedal Utnes' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Re: Error caused by document length Date: Mon, 27 Feb 2023 06:33:28 -0800 (PST) Message-ID: References: <7ed278f7-071b-4bcc-9f9a-e9dd5c09ee55n@googlegroups.com> <8f11cfaf-7c36-4cc6-9866-aa3741d965a4n@googlegroups.com> <4bd152b5-32f7-4f4c-9a9b-0d20afebea84n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_2050_1522128254.1677508408079" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32370"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCWNVA7FUIMRBOP66KPQMGQEDCFAIKA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Feb 27 15:33:34 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qv1-f60.google.com ([209.85.219.60]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pWeZJ-0008GM-Sh for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 27 Feb 2023 15:33:33 +0100 Original-Received: by mail-qv1-f60.google.com with SMTP id ef20-20020a0562140a7400b004c72d0e92bcsf3404063qvb.12 for ; Mon, 27 Feb 2023 06:33:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=suUrf2IV1Gi+/V8n3cK98M5cTKGRBfgjm45CZ2yCRx8=; b=FJlpsnmwuy8HUoiPlaSzvXNOIlOSiKspwyZZUQd2jNCitk3t/KNPib1mpzM+GF129b 7KzGE8VxvBWDqxWfEQECGkNZTuwxLAzUAWr6IMi/pBnE4uex/rAG0942vDBDZzq5gYyx ab3MiiW5cZxgtKVoYwTUXtNhKKNPadSw7yTQAV3H+9Rf768iZl52cXQmSkY8rcTX7qpZ lUNqnLin/Gl0jn1NdkSCSdJVoDhXtUPEE+bkcoe8lTSwQ+FEj93S/QYX4D9C3/Gq0XRf WxqfbnDGdk5sNFHAfFaaEDpHGnL3lvPO7wIxakeUvOuyh6zURnh+2LIW4Tuev9A9EF8f R3Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=suUrf2IV1Gi+/V8n3cK98M5cTKGRBfgjm45CZ2yCRx8=; b=i/zmGa4da/WVEkghwvDmoTh62BgZ6Q37PxqqHAjDh5MZXZY2C/QV4/Tjbp3//xUehf wXDjEaOD/kMYMTKQoPxavpkBvHvY/gRXRnUVaHZacCjRcLvIw9hDHApY7Y9l+KvfC42p PAKzmf6Mq6crnBist6GBk7KQI25BSi/tocYhJLbF+6vVlzKbrtEvnj7YO/4MXq4cQNlV c5mrfROOOxjcwRma+J5KeOLXEo1cJlu3E9V3hCG6QgNDe6L19LtVt0nO2jPMR/WkGgat cHdNGAVdGc8+betQaFjzUU05mrKCBhxVUh12ra19JTYt5s6Y3JEfg+6zwtqzmLoAl1AL pZEg== X-Gm-Message-State: AO0yUKUbvMNPcLCX1q0IdayaeEkdnIpGmqJ7+7ULJCRzloGshIYf2wLt I0wIVDgnJkPyYKM/Uxr0Hqo= X-Google-Smtp-Source: AK7set/FN/2UGSwumjcqG0vLm/xn81NuegdpOW5r/U6hGjS/Snrvw7hGuOpAIxvH4UQ2Xf2NksYRnQ== X-Received: by 2002:ac8:4095:0:b0:3bf:c38c:98ee with SMTP id p21-20020ac84095000000b003bfc38c98eemr1816428qtl.5.1677508412670; Mon, 27 Feb 2023 06:33:32 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a0c:e802:0:b0:56e:9f70:6724 with SMTP id y2-20020a0ce802000000b0056e9f706724ls7663557qvn.8.-pod-prod-gmail; Mon, 27 Feb 2023 06:33:28 -0800 (PST) X-Received: by 2002:ad4:48c2:0:b0:56e:af7f:95be with SMTP id v2-20020ad448c2000000b0056eaf7f95bemr5069697qvx.8.1677508408612; Mon, 27 Feb 2023 06:33:28 -0800 (PST) In-Reply-To: X-Original-Sender: peter.v.utnes-hYqmg196XYc@public.gmane.org X-Original-From: Peter Vedal Utnes Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32242 Archived-At: ------=_Part_2050_1522128254.1677508408079 Content-Type: multipart/alternative; boundary="----=_Part_2051_1847130964.1677508408079" ------=_Part_2051_1847130964.1677508408079 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I just did some further testing, and replaced the sections that I would=20 otherwise have removed with as many words and paragraphs, but no signs,=20 only "test test test" etc. The document then works. So I was wrong about=20 the length: It must be some character or symbol producing the error (only= =20 with pandoc, not other EPUB converters). Any idea how to further isolate=20 it, or how to circumvent with a pandoc command or template? Thanks for the help so far, Bernardo. mandag 27. februar 2023 kl. 15:23:57 UTC+1 skrev Peter Vedal Utnes: > I am not sure what you mean by normalize in this context. I'll elaborate= =20 > in case this is what you mean: In the interest of removing variables that= =20 > might interfere with troubleshooting, I have copied the text from researc= h=20 > papers (not just one, but a few), pasted it in notepad, copied and pasted= =20 > it back into a new word-file (this is more thorough than "clear=20 > formatting"), ran this "pure" file through pandoc and I get the error. If= I=20 > then randomly shorten the file, the error disappears. This is not the cas= e=20 > for my "test" file, but only for research papers, which is baffling. I ca= n=20 > only assume that pandoc responds to something like a character or in-text= =20 > references in particular contexts, or as was my original hypothesis, the= =20 > number of lines or columns in the EPUB.=20 > > mandag 27. februar 2023 kl. 15:17:10 UTC+1 skrev bernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org: > >> Have you tried editing the original research paper in some minor way=20 >> (adding or removing a couple of characters) and then running it? This is= a=20 >> completely wild guess, but maybe the text in the file is getting normali= zed=20 >> upon editing them, whereas the original research paper still contains th= e=20 >> unedited, unnormalized text. >> >> On Mon, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal Utnes' via pandoc-= discuss < >> pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: >> >>> I thank you for the suggestion. It is proving somewhat hard to=20 >>> (dis)confirm. I have made a testfile with just the word "test" pasted o= ver=20 >>> and over again, with and without various formatting and with the same= =20 >>> length or longer as the proper papers. This file consistently works. Bu= t=20 >>> when I attempt to do it with a regular research paper, it only works if= I=20 >>> shorten it. Curiously, I can remove either half of the main text, or in= deed=20 >>> sections here and there, randomly, and it works, but not with all of th= em=20 >>> present. I have combed it for special characters or tags, but cannot fi= nd=20 >>> any.=20 >>> >>> mandag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Bernardo C. D. A.=20 >>> Vasconcelos: >>> >>>> I do not know the answer to this problem in particular, but perhaps it= =20 >>>> is worth checking the main document *and* the bibliography for=20 >>>> invisible control characters (e.g. `\X{A0}`). They tend to cause all s= orts=20 >>>> of strange problems that result in random error msgs. >>>> >>>> On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM UTC-3 Peter Vedal U= tnes=20 >>>> wrote: >>>> >>>>> We have a workflow in Open Journal Systems where we use Pandoc to=20 >>>>> convert word documents to EPUB, and then display them with an embedde= d EPUB=20 >>>>> app (Bibi).=20 >>>>> >>>>> Our resulting EPUBs work fine with both debuggers and viewers like=20 >>>>> calibre. They work in Bibi, but only when they are reduced to a certa= in=20 >>>>> length. Whenever the files exceed approx 100 lines or 600 words, Bibi= =20 >>>>> claims: >>>>> >>>>> TypeError: Cannot read properties of undefined (reading =E2=80=98getA= ttribute=E2=80=99) >>>>> >>>>> Meanwhile, the same documents works when converted to EPUB using othe= r=20 >>>>> converters, or when I reduce the length (length, not size in bytes-- = I've=20 >>>>> tried with graphics, still works). It suddenly works when I reduce th= e=20 >>>>> length by removing pure paragraph text, even though all the formatted= =20 >>>>> elements (abstract, references, etc) are the same.=20 >>>>> >>>>> I recognize that this problem is very specific to the interrelation= =20 >>>>> pandoc <-> Bibi, but I'd be grateful for general troubleshooting=20 >>>>> suggestions.=20 >>>>> >>>>> Thanks in advance,=20 >>>>> >>>>> Peter >>>>> >>>>> --=20 >>> You received this message because you are subscribed to a topic in the= =20 >>> Google Groups "pandoc-discuss" group. >>> To unsubscribe from this topic, visit=20 >>> https://groups.google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscrib= e >>> . >>> To unsubscribe from this group and all its topics, send an email to=20 >>> pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit=20 >>> https://groups.google.com/d/msgid/pandoc-discuss/4bd152b5-32f7-4f4c-9a9= b-0d20afebea84n%40googlegroups.com=20 >>> >>> . >>> >> --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/bc147d77-69c9-4e5d-82a6-e149f662a823n%40googlegroups.com. ------=_Part_2051_1847130964.1677508408079 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I just did some further testing, and replaced the sections that I would oth= erwise have removed with as many words and paragraphs, but no signs, only "= test test test" etc. The document then works. So I was wrong about the leng= th: It must be some character or symbol producing the error (only with pand= oc, not other EPUB converters). Any idea how to further isolate it, or how = to circumvent with a pandoc command or template?

Thank= s for the help so far, Bernardo.



mandag 27. februar 2023 kl. 15:23:57 UTC+1 skrev Peter Veda= l Utnes:
I am= not sure what you mean by normalize in this context. I'll elaborate in= case this is what you mean: In the interest of removing variables that mig= ht interfere with troubleshooting, I have copied the text from research pap= ers (not just one, but a few), pasted it in notepad, copied and pasted it b= ack into a new word-file (this is more thorough than "clear formatting= "), ran this "pure" file through pandoc and I get the error.= If I then randomly shorten the file, the error disappears. This is not the= case for my "test" file, but only for research papers, which is = baffling. I can only assume that pandoc responds to something like a charac= ter or in-text references in particular contexts, or as was my original hyp= othesis, the number of lines or columns in the EPUB.=C2=A0

mandag 27. februar= 2023 kl. 15:17:10 UTC+1 skrev b= ernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org:
Have you tried editing = the original research paper in some minor way (adding or removing a couple = of characters) and then running it? This is a completely wild guess, but ma= ybe the text in the file is getting normalized upon editing them, whereas t= he original research paper still contains the unedited, unnormalized text.<= /div>

On Mon, Feb 27= , 2023 at 10:48=E2=80=AFAM 'Peter Vedal Utnes' via pandoc-discuss &= lt;pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:
=
= I thank you for the suggestion. It is proving somewhat hard to (dis)confirm= . I have made a testfile with just the word "test" pasted over an= d over again, with and without various formatting and with the same length = or longer as the proper papers. This file consistently works. But when I at= tempt to do it with a regular research paper, it only works if I shorten it= . Curiously, I can remove either half of the main text, or indeed sections = here and there, randomly, and it works, but not with all of them present. I= have combed it for special characters or tags, but cannot find any.=C2=A0<= br>
ma= ndag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Bernardo C. D. A. Vasconcelo= s:
I do not know the answer to this problem in pa= rticular, but perhaps it is worth checking the main document and the= bibliography for invisible control characters (e.g. `\X{A0}`). They tend t= o cause all sorts of strange problems that result in random error msgs.
=
On Mo= nday, February 27, 2023 at 8:16:20=E2=80=AFAM UTC-3 Peter Vedal Utnes wrote= :
We have a workflow in Open Journal Systems wher= e we use Pandoc to convert word documents to EPUB, and then display them wi= th an embedded EPUB app (Bibi).=C2=A0

Our resulting EPUB= s work fine with both debuggers and viewers like calibre. They work in Bibi= , but only when they are reduced to a certain length. Whenever the files ex= ceed approx 100 lines or 600 words, Bibi claims:

<= span dir=3D"ltr">TypeError: Cannot read properties of undefined (reading = =E2=80=98getAttribute=E2=80=99)

=
Meanwhile, the same documents works whe= n converted to EPUB using other converters, or when I reduce the length (le= ngth, not size in bytes-- I've tried with graphics, still works).=C2=A0= It suddenly works when I reduce the length by removing pure paragraph text,= even though all the formatted elements (abstract, references, etc) are the= same.=C2=A0

I recognize that this pro= blem is very specific to the interrelation pandoc <-> Bibi, but I'= ;d be grateful for general troubleshooting suggestions.=C2=A0
Thanks in advance,=C2=A0

Peter

--
You received this message because you are subscribed to a topic in the Goog= le Groups "pandoc-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsub= scribe.
To unsubscribe from this group and all its topics, send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4bd152b5-32f= 7-4f4c-9a9b-0d20afebea84n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/bc147d77-69c9-4e5d-82a6-e149f662a823n%40googlegroups.= com.
------=_Part_2051_1847130964.1677508408079-- ------=_Part_2050_1522128254.1677508408079--