From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32247 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Bastien DUMONT Newsgroups: gmane.text.pandoc Subject: Re: Error caused by document length Date: Mon, 27 Feb 2023 16:39:49 +0000 Message-ID: References: <7ed278f7-071b-4bcc-9f9a-e9dd5c09ee55n@googlegroups.com> <8f11cfaf-7c36-4cc6-9866-aa3741d965a4n@googlegroups.com> <4bd152b5-32f7-4f4c-9a9b-0d20afebea84n@googlegroups.com> <0AFB3E23-B7C1-49E8-9F8A-12716F6A2C40@gmail.com> <20942a45-0995-4a50-888a-cf25e9895920n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32104"; mail-complaints-to="usenet@ciao.gmane.io" To: 'Peter Vedal Utnes' via pandoc-discuss Original-X-From: pandoc-discuss+bncBDCINCES2QJRBUV26OPQMGQEWJ6BIKA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Feb 27 17:41:58 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ed1-f55.google.com ([209.85.208.55]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pWgZa-0008Br-AC for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 27 Feb 2023 17:41:58 +0100 Original-Received: by mail-ed1-f55.google.com with SMTP id ec13-20020a0564020d4d00b004a621e993a8sf9497936edb.13 for ; Mon, 27 Feb 2023 08:41:58 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1677516117; cv=pass; d=google.com; s=arc-20160816; b=qLihIZRN25honQjHtBu7L6+awhNAEdWrkXtb4IcIIlQueRdk/OXQ5ABRmTnogG/kht mmeSveF5uIZf+WLPgaMM6AfAMV0F8PcFz93xMWqvm6lNuPCPVqv/NQvDZ7OIZFji4MQ+ mf9BSIGo07LgrGV6HhhFES6XPf27QrTo4w/CMhVJFKWQZseSpUJnkFvVqIBKKMtfJ8QB BBkgiD8H73PzQ+8iwsXl9AOyJzslAYydWFNB3T0OSKhOW9JbJvMA+7z/WQsHPFoQ0vFP 3MnoZCuRtBR9joifzeJ9EjWiz8rJ0EpqYK7niEqBRTFBiQ65FCAjOp97J+3qP13tJTO3 1ftA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:dkim-signature; bh=CEjqSz/vrTdVk+HqFr7fgQYhj6ow3lmw8RxHo6BSV+o=; b=y5b7OQKbjbC4Fnqc8ibkf2T/6JU0Lmsh8hhqYPjbXExViY3CoPeEWDjCZwqoGF5Oyq l3jRs3Lf4sj88XibtXBrJZTEZbuWu47ge5LA7Oz5AVhODcV1+8Vqr3d4onF1SvTd/Put aJOZBpObttPj/XQylNB+x+UFhW1S9iU8Nn69hKS/sUAbUWxTu/+iymRw5vj0rzdKgCkY tleX793dT29NQ5bQ1fVgKGavKwIU+FzuCNaYo06pZQ12rALGBc6cMpTZfPithpWyE2dU jEWrDmVi4hY//4RLzHh+X+5QmiR9j4CSYqAKlxdK8xyh705YwRmc8ttz3peflfbo7y25 bjng== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=qBWLamxk; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:from:to:cc :subject:date:message-id:reply-to; bh=CEjqSz/vrTdVk+HqFr7fgQYhj6ow3lmw8RxHo6BSV+o=; b=JghfFbGvo/B3sg0NgMRSLiIIcnj34//84ehsdzTuFrxe/3g5WzLy5D6+VTCt4UYgdM XhgMtT9wtoTIF1AlonsB5uIX1SUyUrjNDUa8AqstUQofWas1h/g2mD2JJPuw7YiqhUdB SvHExR7ymImczbwt2hhJ/aIO3lb2Fri3vNmpik1yY88eAinSh7p+jJqdUZo3r30RL9UH MyWD42hispemBDRZEak2WWYUI0q6P6FWEgUuDHi7jfG19PWeq4uGV88krxESP7ExxVk0 CSi/tAqVqpcoemxgejONjO+AEfZJx+bAgcu/ce1tGkD4f X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:x-gm-message-state :sender:from:to:cc:subject:date:message-id:reply-to; bh=CEjqSz/vrTdVk+HqFr7fgQYhj6ow3lmw8RxHo6BSV+o=; b=0EYUNjCWW9kwt5G4SXrS3CWnHhOdwsRdsrrCqpKW24BdTJQk+PEE1OP5sL1qXSKRSf 91IiBzSsBtF0LG0p8+rgOu/Z+W2WbhasSG5KbDtyTsFHdWyBSmS6hcZYMcbYRW2rxzk9 M9u2mLL8kgfBE9Iqa5iKnob6KFhCuRy9kK4Lt00WsHZUP0qYqA1SXiySIQGmhYXvGBCs xf9LtDZ5CNtXBfX++uYtrzmdZaWBlb+LsIEEmsWJzfmw7IiVZu7gkJNXupcIBEjUjZqG Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AO0yUKXVxIQ8+n7klA5ehKNcxxvrzqEsbtehJz73M0pHdVRr5yc5JgIc z9h/ruYfWr/XBT/JlcSqKoQ= X-Google-Smtp-Source: AK7set/t2+1IVsBaxV/rlaah6PvkWP0mZ6UnSy3I0JI8GYjpRxxawHwbtfMLsPJVtNBLyHXZtppJhw== X-Received: by 2002:a50:999c:0:b0:4ac:20b:96b0 with SMTP id m28-20020a50999c000000b004ac020b96b0mr64845edb.3.1677516117718; Mon, 27 Feb 2023 08:41:57 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6402:3495:b0:4ad:73cb:b525 with SMTP id v21-20020a056402349500b004ad73cbb525ls5428441edc.3.-pod-prod-gmail; Mon, 27 Feb 2023 08:41:53 -0800 (PST) X-Received: by 2002:ac2:558b:0:b0:4de:869f:687e with SMTP id v11-20020ac2558b000000b004de869f687emr2625740lfg.42.1677515991245; Mon, 27 Feb 2023 08:39:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677515991; cv=none; d=google.com; s=arc-20160816; b=HNBMBpiuCpPlDtBNJxTz4HoiqT580O1/87fZUdvMyJYF1LahqzT8s/LptuhnJzTeIC YY9Q591bFTPg/ixUj9zmG4no0anci1ea7tNyH6NCwJ02GnZqVNTGiasZGnky2fM3aX3s pxeafH27aksfqYmO8gmMdxT1OPxpywyWPusdalY70krcnp6qDynbH4tR7/iAEivUsrQ4 fIpEr0sqkrF71WIdm3wMAIFdiPLA3U0AcorJm+qkTaWq4O6hHgluItR8w200WFtAb9TG rN/winS/uBe3fOLxxiORe9bsiCSQezwlP000qoOS1PecRHXGEltONb8bb+55sDDXOSfj XbnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature; bh=sNL5YXouM/OcZtriecqo7RMWolYy2EonGg8pWTR1gpE=; b=skjS5ANGSGeUE+7owmmNzbwqKHQh4DBM86XyMIfc5VvQzEFktNU/oSUMM1BGaEIj7/ X3GHuZ6RkDZw0jk8nrrUV6NYGwatrnYpoQpRzpR5e787fq6rwZlvAxp+fs2Nd4QMlcgE 8lHV7x/MpdOADAF3KMqF4mRjqb/BwUoiSLM6qRMD/tfHLLAdz+hJzH2aWOVuLd+NJ9jo uC55j+BDxSVXrsOpp21j3GfHU1biVAqUCiMcq0X9sdb9ZM+AtT/IVZw2UBj2wcMzLqyM 4I2MWKvRyl/CTCDje5ueHWbH7cSioj0/CfZAEEWpKmgUly7YdVpbErvkBkn/Es3Q6p7J guQw== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=qBWLamxk; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Original-Received: from mout01.posteo.de (mout01.posteo.de. [185.67.36.65]) by gmr-mx.google.com with ESMTPS id y30-20020a19641e000000b004dbafe55d43si352391lfb.13.2023.02.27.08.39.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Feb 2023 08:39:51 -0800 (PST) Received-SPF: pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) client-ip=185.67.36.65; Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 6CCF82402FD for ; Mon, 27 Feb 2023 17:39:50 +0100 (CET) Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4PQR6s6DpRz9rxN for ; Mon, 27 Feb 2023 17:39:49 +0100 (CET) Content-Disposition: inline In-Reply-To: <20942a45-0995-4a50-888a-cf25e9895920n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=qBWLamxk; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32247 Archived-At: If you narrow down the document to the offending sentences (or only one of = them), does bibi fail to read the resulting EPUB? Such minimal source and E= PUB documents would be easier to inspect, and the latter could even be incl= uded in a bug report for bibi. Le Monday 27 February 2023 =C3=A0 08:22:34AM, 'Peter Vedal Utnes' via pando= c-discuss a =C3=A9crit : > I have now done the elimination process, as suggested by Bastien, of repl= acing > the working file, which was the EPUB of the research paper where I had sw= apped > paragraphs 2-10 with "test test test", with the original paragraphs from = the > paper. It worked until I tried to restore a sentence in the middle of par= agraph > 3, going from above, or paragraph 6, going from below. When I insert the = next > sentence in either end, the document fails to convert (in a manner readab= le by > bibi epub viewer). There does not seem to be unicode characters that migh= t > interfere. I have ran the debugger you suggest, John ,and there are indee= d > errors (metadata not filled in and a missing tag end) but I fixing these = do not > seem to work.=C2=A0 >=20 > Here are the seemingly innocuous sentences that fail from above and below= , > respectively: 1)=C2=A0=C2=A0Over years I have experienced much Bronze in = the form of > articles in toll access (TA) journals that have been made freely availabl= e for > reading =E2=80=93 not open access, but =E2=80=9CFree access=E2=80=9D as s= ome publishers call it. 2) One > thing is to help editors to become aware of the issue, another is to find > practical solutions for them to transition their scholarly content to OA = =E2=80=93 the > rest of their content is really not of interest to us. >=20 > There seem to issues with a few other sentences in those 3 paragraphs too= , but > I can't see a pattern.=C2=A0 > Here is the article in question, though it is only the PDF galley, my EPU= B > testing is on a private server:=C2=A0https://septentrio.uit.no/index.php/= nopos/ > article/view/6665 >=20 >=20 >=20 > mandag 27. februar 2023 kl. 17:08:31 UTC+1 skrev John MacFarlane: >=20 > You could try running epubcheck on the epub produced by pandoc, to se= e if > it points to anything. >=20 >=20 > > On Feb 27, 2023, at 6:33 AM, 'Peter Vedal Utnes' via pandoc-discuss= < > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: > > > > I just did some further testing, and replaced the sections that I w= ould > otherwise have removed with as many words and paragraphs, but no sign= s, > only "test test test" etc. The document then works. So I was wrong ab= out > the length: It must be some character or symbol producing the error (= only > with pandoc, not other EPUB converters). Any idea how to further isol= ate > it, or how to circumvent with a pandoc command or template? > > > > Thanks for the help so far, Bernardo. > > > > > > > > mandag 27. februar 2023 kl. 15:23:57 UTC+1 skrev Peter Vedal Utnes: > > I am not sure what you mean by normalize in this context. I'll elab= orate > in case this is what you mean: In the interest of removing variables = that > might interfere with troubleshooting, I have copied the text from res= earch > papers (not just one, but a few), pasted it in notepad, copied and pa= sted > it back into a new word-file (this is more thorough than "clear > formatting"), ran this "pure" file through pandoc and I get the error= . If I > then randomly shorten the file, the error disappears. This is not the= case > for my "test" file, but only for research papers, which is baffling. = I can > only assume that pandoc responds to something like a character or in-= text > references in particular contexts, or as was my original hypothesis, = the > number of lines or columns in the EPUB. > > > > mandag 27. februar 2023 kl. 15:17:10 UTC+1 skrev bernardov...@gmail= .com: > > Have you tried editing the original research paper in some minor wa= y > (adding or removing a couple of characters) and then running it? This= is a > completely wild guess, but maybe the text in the file is getting norm= alized > upon editing them, whereas the original research paper still contains= the > unedited, unnormalized text. > > > > On Mon, Feb 27, 2023 at 10:48=E2=80=AFAM 'Peter Vedal Utnes' via pa= ndoc-discuss < > pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote: > > I thank you for the suggestion. It is proving somewhat hard to (dis= ) > confirm. I have made a testfile with just the word "test" pasted over= and > over again, with and without various formatting and with the same len= gth or > longer as the proper papers. This file consistently works. But when I > attempt to do it with a regular research paper, it only works if I sh= orten > it. Curiously, I can remove either half of the main text, or indeed > sections here and there, randomly, and it works, but not with all of = them > present. I have combed it for special characters or tags, but cannot = find > any. > > > > mandag 27. februar 2023 kl. 13:49:58 UTC+1 skrev Bernardo C. D. A. > Vasconcelos: > > I do not know the answer to this problem in particular, but perhaps= it is > worth checking the main document and the bibliography for invisible c= ontrol > characters (e.g. `\X{A0}`). They tend to cause all sorts of strange > problems that result in random error msgs. > > > > On Monday, February 27, 2023 at 8:16:20=E2=80=AFAM UTC-3 Peter Veda= l Utnes wrote: > > We have a workflow in Open Journal Systems where we use Pandoc to c= onvert > word documents to EPUB, and then display them with an embedded EPUB a= pp > (Bibi). > > > > Our resulting EPUBs work fine with both debuggers and viewers like > calibre. They work in Bibi, but only when they are reduced to a certa= in > length. Whenever the files exceed approx 100 lines or 600 words, Bibi > claims: > > > > TypeError: Cannot read properties of undefined (reading =E2=80=98ge= tAttribute=E2=80=99) > > > > Meanwhile, the same documents works when converted to EPUB using ot= her > converters, or when I reduce the length (length, not size in bytes-- = I've > tried with graphics, still works). It suddenly works when I reduce th= e > length by removing pure paragraph text, even though all the formatted > elements (abstract, references, etc) are the same. > > > > I recognize that this problem is very specific to the interrelation > pandoc <-> Bibi, but I'd be grateful for general troubleshooting > suggestions. > > > > Thanks in advance, > > > > Peter > > > > > > -- > > You received this message because you are subscribed to a topic in = the > Google Groups "pandoc-discuss" group. > > To unsubscribe from this topic, visit [1]https://groups.google.com/= d/ > topic/pandoc-discuss/hPUa1uWGS_k/unsubscribe. > > To unsubscribe from this group and all its topics, send an email to= =20 > pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit [2]https://groups.google.c= om/d/ > msgid/pandoc-discuss/ > 4bd152b5-32f7-4f4c-9a9b-0d20afebea84n%40googlegroups.com. > > > > -- > > You received this message because you are subscribed to the Google = Groups > "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, s= end an > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit [3]https://groups.google.c= om/d/ > msgid/pandoc-discuss/ > bc147d77-69c9-4e5d-82a6-e149f662a823n%40googlegroups.com. >=20 >=20 > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email > to [4]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [5]https://groups.google.com/d/m= sgid/ > pandoc-discuss/20942a45-0995-4a50-888a-cf25e9895920n%40googlegroups.com. >=20 > References: >=20 > [1] https://groups.google.com/d/topic/pandoc-discuss/hPUa1uWGS_k/unsubscr= ibe > [2] https://groups.google.com/d/msgid/pandoc-discuss/4bd152b5-32f7-4f4c-9= a9b-0d20afebea84n%40googlegroups.com > [3] https://groups.google.com/d/msgid/pandoc-discuss/bc147d77-69c9-4e5d-8= 2a6-e149f662a823n%40googlegroups.com > [4] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [5] https://groups.google.com/d/msgid/pandoc-discuss/20942a45-0995-4a50-8= 88a-cf25e9895920n%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/Y/zc1XW7hY71aWqy%40localhost.