From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com [IPv6:2a00:1450:400c:c05::235]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 408077891C for ; Fri, 11 Sep 2015 10:59:41 -0700 (PDT) Received: by wiclk2 with SMTP id lk2so73374824wic.0 for ; Fri, 11 Sep 2015 11:02:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=lfDfOTDlEwqadzO9Eces1vrqgUJq3Qu/7BkI2Evwzj8=; b=b3qxCwxCAlTstFOlXU3sTvgL0b83vx2N2Drn81D2No4DZ7cGmPnidmY8aPcIRHwDrW McrxP5KW1w8Dwf+5juTC2Gdgk2u/d22P//pyceqULT+XxZHQVWGP17st+3dqZO3VGdiM iQ7hg++YaUjRnQwxVpsKt2X+AdYx5rSS40/6UenSqjpbTnIJptTE6yaVoCNDQOBl+YNc v2UGMfqh62v/TAFfLw9TdxA0quqAmzdFdR2e0QvNvJboqHvQKhjjaWa3m/WBj02LaWYe Imf6TUXxkkgPEk5kDW2SmcNwwD0Ci+XNVEMKUbx/bjRXt7+KuWG3RWnXoCwr9xD7R0wY yQ1w== X-Received: by 10.194.82.198 with SMTP id k6mr53305wjy.139.1441994540566; Fri, 11 Sep 2015 11:02:20 -0700 (PDT) Received: from toaster.adamthompson.me.uk (toaster.adamthompson.me.uk. [2001:8b0:1142:9042::2]) by smtp.gmail.com with ESMTPSA id nb10sm281286wic.11.2015.09.11.11.02.18 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Sep 2015 11:02:18 -0700 (PDT) Date: Fri, 11 Sep 2015 19:02:17 +0100 From: Adam Thompson To: Karl Dahlke Cc: edbrowse-dev@lists.the-brannons.com Message-ID: <20150911180217.GB29720@toaster.adamthompson.me.uk> References: <55F21D99.7070701@pcdesk.net> <20150911073939.GA29720@toaster.adamthompson.me.uk> <20150811061713.eklhad@comcast.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xgyAXRrhYN0wYx8y" Content-Disposition: inline In-Reply-To: <20150811061713.eklhad@comcast.net> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [Edbrowse-dev] script tags in scripts X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Sep 2015 17:59:41 -0000 --xgyAXRrhYN0wYx8y Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 11, 2015 at 06:17:13AM -0400, Karl Dahlke wrote: > > I'm not sure what we can do about this, > > but I'm inclined to think that whatever we do won't catch every case an= d that > > at some stage we have to accept that and move on. >=20 > That was true of my parser, true of tidy5, and true of any parser, > however, as you point out regularly, we should handle most websites > that other browsers handle. > And when we don't, > entire web pages shouldn't disappear beyond the point of error. > This bug is produced by fanfiction.net and fictionpress.com, > two high volume sites that work on every other browser. Agreed, we need to work out what's breaking here and why it's affecting tid= y5 and not, say, firefox etc. I may try the pages with some other html parsing libs (not applicable to edbrowse unfortunately as they're in, e.g. Python or Perl) to see what they do with the pages. I'm just saying that I think we should continue to move forward with the de= sign on the basis that tidy5 will be fixed. If it's not then we'll need to look at other alternatives but there're a lo= t of elements of the new design which should stay in any case I think. > And by the way, my thanks to those users who exercise and test our bleedi= ng edge software; > you're as brave as a Windows 10 insider. I second this. We need users to test this software and I appreciate the time and effort it takes to keep on top of the latest code, particularly when we're adding library dependancies. > In any case, tidy5 needs to fix this, > or we need to find a way to preprocess around it, > the latter meaning I'd have to keep at least half of my parser, > which I really wanted to throw away entirely. :( May be, or we keep the tidy-inspired design but rewrite the parsing logic, may be borrowing the parsing code from somewhere else and making it our own. I know I said we should try and stay out of the html parsing business, and I still would like to ideally but if we really can't then we can at least keep the current design directi= on. There has to be a parsing lib out there somewhere which works properly... at least I hope there is. Cheers, Adam. --xgyAXRrhYN0wYx8y Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJV8xcpAAoJELZ22lNQBzHONXAIALpOJWKobmR4hRKu23zGzmO9 2ABXLx6V0i8d5e55OOluMN/LQSVWhhAkVZ0yL6PugSpo8cXqLr+ub06rNDDdq1XL ZBGG7s2N4V2RCWt8PVkxuvdB+MsuGcbOM3HMvp5N+Qk82XEtmGlr9ObDYKZeZUde nRwdxIcNqeRBPanEIQ8rX+XtcM23vyYFExOEdjg8JWtcWmrIAcZb+8bO1N4pTceM hHYW8SXy5+zYZT7HVZ7rSMSAjgl14+EE937garzgryacDNhqCYqohiUuUXrRkNym NEqVz5uiHX199ANYl3eDysKKd9+0zHjsKkfMCzU1v9eBHgaz1WTJjuxO2bgSZ8c= =l7pq -----END PGP SIGNATURE----- --xgyAXRrhYN0wYx8y--