From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32244 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "Bernardo C. D. A. Vasconcelos" Newsgroups: gmane.text.pandoc Subject: Re: Error caused by document length Date: Mon, 27 Feb 2023 07:45:19 -0800 (PST) Message-ID: <1691e374-df1e-46a4-b4b4-8213b5d3c16en@googlegroups.com> References: <7ed278f7-071b-4bcc-9f9a-e9dd5c09ee55n@googlegroups.com> <8f11cfaf-7c36-4cc6-9866-aa3741d965a4n@googlegroups.com> <4bd152b5-32f7-4f4c-9a9b-0d20afebea84n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_7282_115911821.1677512719342" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15418"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDUKFWODQ4ARBEFA6OPQMGQEC7DSUQA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Feb 27 16:45:24 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qk1-f192.google.com ([209.85.222.192]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pWfgp-0003kh-U9 for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 27 Feb 2023 16:45:23 +0100 Original-Received: by mail-qk1-f192.google.com with SMTP id s21-20020a05620a0bd500b0074234f33f24sf4203362qki.3 for ; Mon, 27 Feb 2023 07:45:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=At5mEluXBa4Fh/SVEnLAFXc9ihaJEbnrVQW1NBVJjjc=; b=Oq8tpTkmhF1KnLy3/DcobWewXCOKLR4H1XRZe8blOSo1TlXScUGV/56fRdHVtiuMly Rtmn6iVTt44EWgo9KZQGPyAy0MTjxMpHOwPn0cB4IUgwip/HUBAF1CEDctTCZliL+Cke zPFlzdS0413aK/SEYVCj20v/XNNo9TAIVcsjhMMGKKqiQR1foZQ/vs6qvepEcmECefoD RzV3LNCvYp+qQqyPj+bg8OSVhXAAzkQiIBlCfvVU1X5mvHxlyN+3Ogcv6XK4KuQI9qh+ aZtNa0mRKluY5VlKaicNOnJ/Eh2YNNByEgr72+B6mhWq2/BtUmJU4WdqXK5IL5ybF7sT 4Bkw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=At5mEluXBa4Fh/SVEnLAFXc9ihaJEbnrVQW1NBVJjjc=; b=p6BZ72gm5ks5aHqfuebJWaRZciQTiwFVRL2NxDkVkyPsRGCBcZ9mKgnCKHeMIR5ZaR wB4ER0Re7iTvawEcpziac/Dk3I4vG97mcy4VR921vrQah8fqvIiIJQz7Yx8BBOfG2DTg 959rXGa08hg0ljdTH4DU6SG6AE1kQ963hN4dS7xxHtU4GY7NYpFltKt9Fup1T45v7T7e 5HzuqfkxAO0gJlVo3B9Z2nKXqZJwX6b5XA7BsrhuyYO9f/9/HQ6IIv31vtLoKTDZ6DJp 9KodyKjtDzvaouY32TCb4uszCv/sDvCBSUcpU3ari6Q1msIbC2H3EEo+N8L1YQL8NfCV VaqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:sender:from:to:cc :subject:date:message-id:reply-to; bh=At5mEluXBa4Fh/SVEnLAFXc9ihaJEbnrVQW1NBVJjjc=; b=e0MsIKUehYed4L0iUkKmbidhM+djDxqMArSPAJGx6cZQ9O9FNzvlqBgG6xgZGKj9f8 Yanthkoea4FHxZ20b6KEQ94/y5KBrYwpOOF0HG7zpP/Etn5O7ZirpeO4oJNbwHnjSc7e TzxRa3LCGrlNFlQbo6dRALw/Eu1Y6bUlEjo4GNlQrsnGjoQYek1f1a0qGUuQZMMBIK52 2mBvU1KaPz3xyraiC9IvCrPr0tgiO4QZjqkcWmCOtXDpXhRNL4AtJXM4aYUw4VvFkjoW 3MY9X33toboG3xbc5plQAdXHOYdGZYYJf3vms7ZD3ksIfP0RvrqOnb9W0PyyEMxGBKqN Vqfw== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AO0yUKV7f9a5OFLUxDdf4RUCc295b/LDbxnloF1cKfDRw/soH8JbQwZ4 U1iCqxfGUvHI6tJIgw+wr+4= X-Google-Smtp-Source: AK7set/Ae8/t1vo9R+aEPy9sitEp89caiiznSK365xiSK0dBQ+/BS6eO2u+P+aZXwc2VsR78bqQUOA== X-Received: by 2002:a05:620a:4898:b0:73b:6f11:3a6c with SMTP id ea24-20020a05620a489800b0073b6f113a6cmr3426780qkb.4.1677512722592; Mon, 27 Feb 2023 07:45:22 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:620a:439d:b0:6fa:3c89:c39 with SMTP id a29-20020a05620a439d00b006fa3c890c39ls2113926qkp.6.-pod-prod-gmail; Mon, 27 Feb 2023 07:45:20 -0800 (PST) X-Received: by 2002:a05:620a:a07:b0:73b:a941:7206 with SMTP id i7-20020a05620a0a0700b0073ba9417206mr5197424qka.7.1677512719946; Mon, 27 Feb 2023 07:45:19 -0800 (PST) In-Reply-To: X-Original-Sender: bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32244 Archived-At: ------=_Part_7282_115911821.1677512719342 Content-Type: multipart/alternative; boundary="----=_Part_7283_1354441313.1677512719342" ------=_Part_7283_1354441313.1677512719342 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Also, maybe this could help: https://www.soscisurvey.de/tools/view-chars.ph= p On Monday, February 27, 2023 at 11:54:52=E2=80=AFAM UTC-3 Bastien DUMONT wr= ote: > Maybe you could restore the paragraphs you replaced with "test" one by on= e=20 > and convert the document until Bidi throws an error. Then, you can remove= =20 > the sentences of the offending paragraph one by one until the document is= =20 > read again without error. Thus you could isolate at least one of the=20 > sentences that cause the error. > > Le Monday 27 February 2023 =C3=A0 06:33:28AM, 'Peter Vedal Utnes' via=20 > pandoc-discuss a =C3=A9crit : > > I just did some further testing, and replaced the sections that I would > > otherwise have removed with as many words and paragraphs, but no signs,= =20 > only > > "test test test" etc. The document then works. So I was wrong about the= =20 > length: > > It must be some character or symbol producing the error (only with=20 > pandoc, not > > other EPUB converters). Any idea how to further isolate it, or how to > > circumvent with a pandoc command or template? > >=20 > > Thanks for the help so far, Bernardo. > >=20 > >=20 > >=20 > > mandag 27. februar 2023 kl. 15:23:57 UTC+1 skrev Peter Vedal Utnes: > >=20 > > I am not sure what you mean by normalize in this context. I'll elaborat= e=20 > in > > case this is what you mean: In the interest of removing variables that > > might interfere with troubleshooting, I have copied the text from=20 > research > > papers (not just one, but a few), pasted it in notepad, copied and past= ed > > it back into a new word-file (this is more thorough than "clear > > formatting"), ran this "pure" file through pandoc and I get the error.= =20 > If I > > then randomly shorten the file, the error disappears. This is not the= =20 > case > > for my "test" file, but only for research papers, which is baffling. I= =20 > can > > only assume that pandoc responds to something like a character or in-te= xt > > references in particular contexts, or as was my original hypothesis, th= e > > number of lines or columns in the EPUB.=20 > >=20 > > mandag 27. februar 2023 kl. 15:17:10 UTC+1 skrev bernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org= : > >=20 > > Have you tried editing the original research paper in some minor way > > (adding or removing a couple of characters) and then running it? This > > is a completely wild guess, but maybe the text in the file is getting > > normalized upon editing them, whereas the original research paper still > > contains the unedited, unnormalized text. > >=20 > > On Mon, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal Utnes' via pandoc= -discuss > > wrote: > >=20 > > I thank you for the suggestion. It is proving somewhat hard to > > (dis)confirm. I have made a testfile with just the word "test" > > pasted over and over again, with and without various formatting and > > with the same length or longer as the proper papers. This file > > consistently works. But when I attempt to do it with a regular > > research paper, it only works if I shorten it. Curiously, I can > > remove either half of the main text, or indeed sections here and > > there, randomly, and it works, but not with all of them present. I > > have combed it for special characters or tags, but cannot find > > any.=20 > >=20 > > mandag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Bernardo C. D. A. > > Vasconcelos: > >=20 > > I do not know the answer to this problem in particular, but > > perhaps it is worth checking the main document and the > > bibliography for invisible control characters (e.g. `\X{A0}`). > > They tend to cause all sorts of strange problems that result in > > random error msgs. > >=20 > > On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM UTC-3 Peter Vedal > > Utnes wrote: > >=20 > > We have a workflow in Open Journal Systems where we use > > Pandoc to convert word documents to EPUB, and then display > > them with an embedded EPUB app (Bibi).=20 > >=20 > > Our resulting EPUBs work fine with both debuggers and > > viewers like calibre. They work in Bibi, but only when they > > are reduced to a certain length. Whenever the files exceed > > approx 100 lines or 600 words, Bibi claims: > >=20 > > TypeError: Cannot read properties of undefined (reading > > =E2=80=98getAttribute=E2=80=99) > >=20 > > Meanwhile, the same documents works when converted to EPUB > > using other converters, or when I reduce the length > > (length, not size in bytes-- I've tried with graphics, > > still works). It suddenly works when I reduce the length by > > removing pure paragraph text, even though all the formatted > > elements (abstract, references, etc) are the same.=20 > >=20 > > I recognize that this problem is very specific to the > > interrelation pandoc <-> Bibi, but I'd be grateful for > > general troubleshooting suggestions.=20 > >=20 > > Thanks in advance,=20 > >=20 > > Peter > >=20 > >=20 > > -- > > You received this message because you are subscribed to a topic in > > the Google Groups "pandoc-discuss" group. > > To unsubscribe from this topic, visit [1]https://groups.google.com/ > > d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe. > > To unsubscribe from this group and all its topics, send an email to > > pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit [2]https:// > > groups.google.com/d/msgid/pandoc-discuss/ > > 4bd152b5-32f7-4f4c-9a9b-0d20afebea84n%40googlegroups.com. > >=20 > > -- > > You received this message because you are subscribed to the Google Grou= ps > > "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email > > to [3]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit [4] > https://groups.google.com/d/msgid/ > > pandoc-discuss/bc147d77-69c9-4e5d-82a6-e149f662a823n%40googlegroups.com= . > >=20 > > References: > >=20 > > [1]=20 > https://groups.google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe > > [2]=20 > https://groups.google.com/d/msgid/pandoc-discuss/4bd152b5-32f7-4f4c-9a9b-= 0d20afebea84n%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter > > [3] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > > [4]=20 > https://groups.google.com/d/msgid/pandoc-discuss/bc147d77-69c9-4e5d-82a6-= e149f662a823n%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/1691e374-df1e-46a4-b4b4-8213b5d3c16en%40googlegroups.com. ------=_Part_7283_1354441313.1677512719342 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Also, maybe this could help:=C2=A0https://www.soscisurvey.de/tools/view-cha= rs.php


On Monday, February 27, 2023 at 11:54:52=E2=80=AFAM UTC= -3 Bastien DUMONT wrote:
Maybe you could restore the paragraphs you replaced with "= ;test" one by one and convert the document until Bidi throws an error.= Then, you can remove the sentences of the offending paragraph one by one u= ntil the document is read again without error. Thus you could isolate at le= ast one of the sentences that cause the error.

Le Monday 27 February 2023 =C3=A0 06:33:28AM, 'Peter Vedal Utnes= 9; via pandoc-discuss a =C3=A9crit :
> I just did some further testing, and replaced the sections that I = would
> otherwise have removed with as many words and paragraphs, but no s= igns, only
> "test test test" etc. The document then works. So I was = wrong about the length:
> It must be some character or symbol producing the error (only with= pandoc, not
> other EPUB converters). Any idea how to further isolate it, or how= to
> circumvent with a pandoc command or template?
>=20
> Thanks for the help so far, Bernardo.
>=20
>=20
>=20
> mandag 27. februar 2023 kl. 15:23:57 UTC+1 skrev Peter Vedal Utnes= :
>=20
> I am not sure what you mean by normalize in this context. I= 9;ll elaborate in
> case this is what you mean: In the interest of removing variab= les that
> might interfere with troubleshooting, I have copied the text f= rom research
> papers (not just one, but a few), pasted it in notepad, copied= and pasted
> it back into a new word-file (this is more thorough than "= ;clear
> formatting"), ran this "pure" file through pand= oc and I get the error. If I
> then randomly shorten the file, the error disappears. This is = not the case
> for my "test" file, but only for research papers, wh= ich is baffling. I can
> only assume that pandoc responds to something like a character= or in-text
> references in particular contexts, or as was my original hypot= hesis, the
> number of lines or columns in the EPUB.=C2=A0
>=20
> mandag 27. februar 2023 kl. 15:17:10 UTC+1 skrev bernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org:
>=20
> Have you tried editing the original research paper in some= minor way
> (adding or removing a couple of characters) and then runni= ng it? This
> is a completely wild guess, but maybe the text in the file= is getting
> normalized upon editing them, whereas the original researc= h paper still
> contains the unedited, unnormalized text.
>=20
> On Mon, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal = Utnes' via pandoc-discuss
> <pandoc-...@= googlegroups.com> wrote:
>=20
> I thank you for the suggestion. It is proving somewhat= hard to
> (dis)confirm. I have made a testfile with just the wor= d "test"
> pasted over and over again, with and without various f= ormatting and
> with the same length or longer as the proper papers. T= his file
> consistently works. But when I attempt to do it with a= regular
> research paper, it only works if I shorten it. Curious= ly, I can
> remove either half of the main text, or indeed section= s here and
> there, randomly, and it works, but not with all of the= m present. I
> have combed it for special characters or tags, but can= not find
> any.=C2=A0
>=20
> mandag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Berna= rdo C. D. A.
> Vasconcelos:
>=20
> I do not know the answer to this problem in partic= ular, but
> perhaps it is worth checking the main document and= the
> bibliography for invisible control characters (e.g= . `\X{A0}`).
> They tend to cause all sorts of strange problems t= hat result in
> random error msgs.
>=20
> On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM= UTC-3 Peter Vedal
> Utnes wrote:
>=20
> We have a workflow in Open Journal Systems whe= re we use
> Pandoc to convert word documents to EPUB, and = then display
> them with an embedded EPUB app (Bibi).=C2=A0
>=20
> Our resulting EPUBs work fine with both debugg= ers and
> viewers like calibre. They work in Bibi, but o= nly when they
> are reduced to a certain length. Whenever the = files exceed
> approx 100 lines or 600 words, Bibi claims:
>=20
> TypeError: Cannot read properties of undefined= (reading
> =E2=80=98getAttribute=E2=80=99)
> =20
> Meanwhile, the same documents works when conve= rted to EPUB
> using other converters, or when I reduce the l= ength
> (length, not size in bytes-- I've tried wi= th graphics,
> still works).=C2=A0It suddenly works when I re= duce the length by
> removing pure paragraph text, even though all = the formatted
> elements (abstract, references, etc) are the s= ame.=C2=A0
>=20
> I recognize that this problem is very specific= to the
> interrelation pandoc <-> Bibi, but I'= ;d be grateful for
> general troubleshooting suggestions.=C2=A0
>=20
> Thanks in advance,=C2=A0
>=20
> Peter
>=20
>=20
> --
> You received this message because you are subscribed t= o a topic in
> the Google Groups "pandoc-discuss" group.
> To unsubscribe from this topic, visit [1]https://groups.google.com/
> d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe.
> To unsubscribe from this group and all its topics, sen= d an email to
> pandoc-disc= us...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> To view this discussion on the web visit [2]https://
> groups.google.com/d/msgid/pandoc-discuss/
> 4bd152b5-32f7-4f4c-9a9b-0d20afebea84n%40googlegroups.com.
>=20
> --
> You received this message because you are subscribed to the Google= Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email
> to [3]pandoc-discus...@= googlegroups.com.
> To view this discussion on the web visit [4]https://groups.google.com/d/msgid/
> pandoc-discuss/bc147d77-69c9-4e5d-82a6-e149f662a823n%40googlegroups.com.
>=20
> References:
>=20
> [1] https://groups.= google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe
> [2] https://groups= .google.com/d/msgid/pandoc-discuss/4bd152b5-32f7-4f4c-9a9b-0d20afebea84n%40= googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter
> [3] mailto:pandoc-discu= s...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [4] https://groups= .google.com/d/msgid/pandoc-discuss/bc147d77-69c9-4e5d-82a6-e149f662a823n%40= googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/1691e374-df1e-46a4-b4b4-8213b5d3c16en%40googlegroups.= com.
------=_Part_7283_1354441313.1677512719342-- ------=_Part_7282_115911821.1677512719342--