From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32241 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "'Peter Vedal Utnes' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Re: Error caused by document length Date: Mon, 27 Feb 2023 06:23:57 -0800 (PST) Message-ID: References: <7ed278f7-071b-4bcc-9f9a-e9dd5c09ee55n@googlegroups.com> <8f11cfaf-7c36-4cc6-9866-aa3741d965a4n@googlegroups.com> <4bd152b5-32f7-4f4c-9a9b-0d20afebea84n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_5946_1622622669.1677507837162" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22687"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCWNVA7FUIMRB7XZ6KPQMGQEXNZ763I-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Feb 27 15:24:02 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qt1-f192.google.com ([209.85.160.192]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pWeQ5-0005kD-U0 for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 27 Feb 2023 15:24:01 +0100 Original-Received: by mail-qt1-f192.google.com with SMTP id x4-20020ac85384000000b003bfbb485e2dsf2829585qtp.22 for ; Mon, 27 Feb 2023 06:24:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=kDTlMRAVjjqTczzdxiJfM+QRoMp9OeE+ODKHNSWtlKE=; b=GWDjDBzobFGyo6aoyjEeVph+tgIhoQab/rpegzSSTvwKjCr+vKore4KFXg7LC3jh9j BAcPBYgK+0Deca5SJquO55cFuPpx3zf4IdslHnRJVBSTfTDhBq4rnS58tbAcY5yh4o3I XQVoP3LCQhm/hG7054JBzYqLvJLgPBGwYrtWRqnNErUe5rw2kCzpUICoCdtZCq0JSdJF kkBBiyn7eU55S5o5KbwJWs5961CLgJTNDvoCYKlnK56X5hbq5BcKuita6ElMacri31HR 4mj9hn8seWalsyb8cdz7nWn6JFT/4NrIxgSDqdgNDsvW5P6JfwGfJIgayVw8MI1H8izO 9S+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kDTlMRAVjjqTczzdxiJfM+QRoMp9OeE+ODKHNSWtlKE=; b=kBFJ4MXj4TL8cwXTXM5uSwxx/MgXKIP8ZwqoTGrl+g3wa7UztUN/q5gcOUl5j+BcgG +oP3hBI2BbwGChwi5reoaDoyproDnu31BZB6tb10Zx3GPJR8wcLYvhcGJO92xrnF8qRu 1KlEYPzWMEYGAHkcesY3RRnaauOupDOiQojEex+jC2+H3Ru+YQr5x5K8MpZB/dhmQcWk QYUh3VMiUazOIVkbixw8yL2jav0nF9ldv6UYworAbCAFDADG59y3ibhNWh8Vf7FhqzbX Gw5HrLYp8b+U9kSx53eNNEceQTX2BfXtYDA01MzYuxfLlQit6UeoeWWNOxcLeAZtiaoA XJMA== X-Gm-Message-State: AO0yUKWgZYljTezwht9dqwlvADRk4mP0Ewwg0YJNIHMV0gbKMdnOkOMX PT6gh7fiNC+k6HQp91PRRrM= X-Google-Smtp-Source: AK7set/IZSmVNy+SzPn5uN3oyGIR0rRuPVAzJ/+0XRyZ/V94ecSNrO44izJYvS+0j9JojCM+2a8ltQ== X-Received: by 2002:a05:620a:6c6:b0:742:6e03:4091 with SMTP id 6-20020a05620a06c600b007426e034091mr2581165qky.6.1677507840857; Mon, 27 Feb 2023 06:24:00 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:620a:98f:b0:73b:79fc:2f0 with SMTP id x15-20020a05620a098f00b0073b79fc02f0ls2069803qkx.9.-pod-prod-gmail; Mon, 27 Feb 2023 06:23:58 -0800 (PST) X-Received: by 2002:a05:620a:1539:b0:742:74ac:72c8 with SMTP id n25-20020a05620a153900b0074274ac72c8mr2743475qkk.4.1677507837763; Mon, 27 Feb 2023 06:23:57 -0800 (PST) In-Reply-To: X-Original-Sender: peter.v.utnes-hYqmg196XYc@public.gmane.org X-Original-From: Peter Vedal Utnes Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32241 Archived-At: ------=_Part_5946_1622622669.1677507837162 Content-Type: multipart/alternative; boundary="----=_Part_5947_861072841.1677507837162" ------=_Part_5947_861072841.1677507837162 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I am not sure what you mean by normalize in this context. I'll elaborate in= =20 case this is what you mean: In the interest of removing variables that=20 might interfere with troubleshooting, I have copied the text from research= =20 papers (not just one, but a few), pasted it in notepad, copied and pasted= =20 it back into a new word-file (this is more thorough than "clear=20 formatting"), ran this "pure" file through pandoc and I get the error. If I= =20 then randomly shorten the file, the error disappears. This is not the case= =20 for my "test" file, but only for research papers, which is baffling. I can= =20 only assume that pandoc responds to something like a character or in-text= =20 references in particular contexts, or as was my original hypothesis, the=20 number of lines or columns in the EPUB.=20 mandag 27. februar 2023 kl. 15:17:10 UTC+1 skrev bernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org: > Have you tried editing the original research paper in some minor way=20 > (adding or removing a couple of characters) and then running it? This is = a=20 > completely wild guess, but maybe the text in the file is getting normaliz= ed=20 > upon editing them, whereas the original research paper still contains the= =20 > unedited, unnormalized text. > > On Mon, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal Utnes' via pandoc-d= iscuss < > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: > >> I thank you for the suggestion. It is proving somewhat hard to=20 >> (dis)confirm. I have made a testfile with just the word "test" pasted ov= er=20 >> and over again, with and without various formatting and with the same=20 >> length or longer as the proper papers. This file consistently works. But= =20 >> when I attempt to do it with a regular research paper, it only works if = I=20 >> shorten it. Curiously, I can remove either half of the main text, or ind= eed=20 >> sections here and there, randomly, and it works, but not with all of the= m=20 >> present. I have combed it for special characters or tags, but cannot fin= d=20 >> any.=20 >> >> mandag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Bernardo C. D. A.=20 >> Vasconcelos: >> >>> I do not know the answer to this problem in particular, but perhaps it= =20 >>> is worth checking the main document *and* the bibliography for=20 >>> invisible control characters (e.g. `\X{A0}`). They tend to cause all so= rts=20 >>> of strange problems that result in random error msgs. >>> >>> On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM UTC-3 Peter Vedal Ut= nes wrote: >>> >>>> We have a workflow in Open Journal Systems where we use Pandoc to=20 >>>> convert word documents to EPUB, and then display them with an embedded= EPUB=20 >>>> app (Bibi).=20 >>>> >>>> Our resulting EPUBs work fine with both debuggers and viewers like=20 >>>> calibre. They work in Bibi, but only when they are reduced to a certai= n=20 >>>> length. Whenever the files exceed approx 100 lines or 600 words, Bibi= =20 >>>> claims: >>>> >>>> TypeError: Cannot read properties of undefined (reading =E2=80=98getAt= tribute=E2=80=99) >>>> >>>> Meanwhile, the same documents works when converted to EPUB using other= =20 >>>> converters, or when I reduce the length (length, not size in bytes-- I= 've=20 >>>> tried with graphics, still works). It suddenly works when I reduce the= =20 >>>> length by removing pure paragraph text, even though all the formatted= =20 >>>> elements (abstract, references, etc) are the same.=20 >>>> >>>> I recognize that this problem is very specific to the interrelation=20 >>>> pandoc <-> Bibi, but I'd be grateful for general troubleshooting=20 >>>> suggestions.=20 >>>> >>>> Thanks in advance,=20 >>>> >>>> Peter >>>> >>>> --=20 >> You received this message because you are subscribed to a topic in the= =20 >> Google Groups "pandoc-discuss" group. >> To unsubscribe from this topic, visit=20 >> https://groups.google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe= . >> To unsubscribe from this group and all its topics, send an email to=20 >> pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit=20 >> https://groups.google.com/d/msgid/pandoc-discuss/4bd152b5-32f7-4f4c-9a9b= -0d20afebea84n%40googlegroups.com=20 >> >> . >> > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/db7972f9-8881-4941-92ea-9b8f51c0c404n%40googlegroups.com. ------=_Part_5947_861072841.1677507837162 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I am not sure what you mean by normalize in this context. I'll elaborate in= case this is what you mean: In the interest of removing variables that mig= ht interfere with troubleshooting, I have copied the text from research pap= ers (not just one, but a few), pasted it in notepad, copied and pasted it b= ack into a new word-file (this is more thorough than "clear formatting"), r= an this "pure" file through pandoc and I get the error. If I then randomly = shorten the file, the error disappears. This is not the case for my "test" = file, but only for research papers, which is baffling. I can only assume th= at pandoc responds to something like a character or in-text references in p= articular contexts, or as was my original hypothesis, the number of lines o= r columns in the EPUB.=C2=A0

mandag 27. februar 2023 kl. 15:17:10 UTC+1 s= krev bernardov...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org:
Have you tried= editing the original research paper in some minor way (adding or removing = a couple of characters) and then running it? This is a completely wild gues= s, but maybe the text in the file is getting normalized upon editing them, = whereas the original research paper still contains the unedited, unnormaliz= ed text.

On Mo= n, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal Utnes' via pandoc-= discuss <pandoc-...@googlegro= ups.com> wrote:
I thank you for the suggestion. It is proving= somewhat hard to (dis)confirm. I have made a testfile with just the word &= quot;test" pasted over and over again, with and without various format= ting and with the same length or longer as the proper papers. This file con= sistently works. But when I attempt to do it with a regular research paper,= it only works if I shorten it. Curiously, I can remove either half of the = main text, or indeed sections here and there, randomly, and it works, but n= ot with all of them present. I have combed it for special characters or tag= s, but cannot find any.=C2=A0

mandag 27. februar 2023 kl. 13:49:58 UTC+1 skre= v Bernardo C. D. A. Vasconcelos:
I do not know th= e answer to this problem in particular, but perhaps it is worth checking th= e main document and the bibliography for invisible control character= s (e.g. `\X{A0}`). They tend to cause all sorts of strange problems that re= sult in random error msgs.

On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM= UTC-3 Peter Vedal Utnes wrote:
We have a workflo= w in Open Journal Systems where we use Pandoc to convert word documents to = EPUB, and then display them with an embedded EPUB app (Bibi).=C2=A0
Our resulting EPUBs work fine with both debuggers and viewers l= ike calibre. They work in Bibi, but only when they are reduced to a certain= length. Whenever the files exceed approx 100 lines or 600 words, Bibi clai= ms:

TypeError: Cannot read prope= rties of undefined (reading =E2=80=98getAttribute=E2=80=99)

Meanwhile, = the same documents works when converted to EPUB using other converters, or = when I reduce the length (length, not size in bytes-- I've tried with g= raphics, still works).=C2=A0It suddenly works when I reduce the length by r= emoving pure paragraph text, even though all the formatted elements (abstra= ct, references, etc) are the same.=C2=A0

I recognize that this problem is very specific to the interrelation pand= oc <-> Bibi, but I'd be grateful for general troubleshooting sugg= estions.=C2=A0

Thanks in advance,=C2=A0
=
Peter

--
You received this message because you are subscribed to a topic in the Goog= le Groups "pandoc-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsub= scribe.
To unsubscribe from this group and all its topics, send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.<= br> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4bd152b5-3= 2f7-4f4c-9a9b-0d20afebea84n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/db7972f9-8881-4941-92ea-9b8f51c0c404n%40googlegroups.= com.
------=_Part_5947_861072841.1677507837162-- ------=_Part_5946_1622622669.1677507837162--