From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from resqmta-ch2-09v.sys.comcast.net (resqmta-ch2-09v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:41]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 815CE77A00 for ; Sat, 18 Mar 2017 02:18:25 -0700 (PDT) Received: from resomta-ch2-08v.sys.comcast.net ([69.252.207.104]) by resqmta-ch2-09v.sys.comcast.net with SMTP id pAVXc15f4ImZIpAVXcTtk1; Sat, 18 Mar 2017 09:18:43 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20161114; t=1489828723; bh=mgOUnpndfQlgGO6lu0Mk0cFZulm1cglCq4pfmWURoKY=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=iLAIjw4YebwoFPTAKdYageqTDYyS7uvDISAawASTnTzXv+B3jtdhwj68d64/qvCj5 QXTfo8M7q3QiVb9qLSxtMJIPWKv1cEjR+WL7Op/oyF2kO7u9JUYXCScK8ntYipBa3M xl/pjLoSv/n4Ny84PcaDVl8J/z4R94oboNbPe24MsJjVBvJ2PIi0XCh1PdaKqpztxT 56d7PEVqqzwVrDcupjOepCXdYx/3YECb4rzcGiMqrcTUSTQGFJ70AcXPOO7BVX/qXY +e/EDNgcSaGYcxYfYX9v29l0ZoO3zVVycuN7pOkQCfNw0m5PexejqyZUSv2m1iM+pR lH7jr5cJbDZtw== Received: from unknown ([68.56.159.26]) by resomta-ch2-08v.sys.comcast.net with SMTP id pATacy3Pn4ebhpATbcWGCq; Sat, 18 Mar 2017 09:16:43 +0000 To: Edbrowse-dev@lists.the-brannons.com From: Karl Dahlke Reply-to: Karl Dahlke User-Agent: edbrowse/3.6.2+ Date: Sat, 18 Mar 2017 05:16:42 -0400 Message-ID: <20170218051642.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4wfD3EMNy9fGOXBNnDyueAsEcNU+qkpaS0kEUK6am5xll5oYZKZw1bFph56HnfNMnVU2T1ByCdpXpRTxp+nj4A0sU84CaSYQty/6FoHG+te4w3VogC8sTa 975i3Bw4P3Eu5R1xx0mRlI+oEsqUWWreJjbxn3bJKf8NTY+qI6EQw4Nf6oGuCNlA6q6FDgE+MOfLtA== Subject: [Edbrowse-dev] when the innerHTML string is wrong X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.23 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Mar 2017 09:18:25 -0000 Now that innerHTML is available more often, particularly body.innerHTML, i.e. the entire page, which some scripts want, I must point out a bug that we inherit from tidy. Review the example www.eklhad.net/div which is small enough that I include it here.
Cognitive business is here
The script tag is just so we create javascript, else there wouldn't be any. Tidy knocks the div section outside the anchor, and, it rewrites the html that way, and, that's what it hands us, which is what I use for innerHTML. Now I come along and cleverly detect what has happened, and work around it by moving the div node back underneath the anchor where it belongs. Even if I'm right, even if this isn't a false positive and I shouldn't have done that, the html is still wrong. Specifically, innerHTML is wrong, including a.innerHTML and body.innerHTML. Jump into jdb and see for yourself. This is truly an instance of Sir Walter Scott's: "What a tangled web we weave when first we practice to deceive." We should probably work closer with tidy, to prevent some of these html rewritings, rather than work around then, but then again, tidy crew might say, "We call ourselves tidy because we *fix* html, it is antithetic for us to always leave it the way it is." And yet that is what edbrowse needs. We could fork tidy and make it do what we want, but I *really* don't want to do that, as we would lose the benefit of them maintaining it and enhancing it as html evolves. Same reason we don't want to fork mozjs, or curl, etc. I'm not sure what the solution is here. I added some comments to this effect in decorate.c, though not as long winded as this email. Karl Dahlke