From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out.smtp-auth.no-ip.com (smtp-auth.no-ip.com [8.23.224.61]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 4AEC8795B7 for ; Wed, 29 Jun 2016 20:01:11 -0700 (PDT) X-No-IP: carhart.net@noip-smtp X-Report-Spam-To: abuse@no-ip.com Received: from carhart.net (unknown [99.52.200.227]) (Authenticated sender: carhart.net@noip-smtp) by smtp-auth.no-ip.com (Postfix) with ESMTPA id 75B92401DDA for ; Wed, 29 Jun 2016 20:02:47 -0700 (PDT) Received: from carhart.net (localhost [127.0.0.1]) by carhart.net (8.13.8/8.13.8) with ESMTP id u5U32kl8018739 for ; Wed, 29 Jun 2016 20:02:46 -0700 Received: from localhost (kevin@localhost) by carhart.net (8.13.8/8.13.8/Submit) with ESMTP id u5U32ksc018732 for ; Wed, 29 Jun 2016 20:02:46 -0700 Date: Wed, 29 Jun 2016 20:02:46 -0700 (PDT) From: Kevin Carhart To: edbrowse-dev@lists.the-brannons.com Message-ID: User-Agent: Alpine 2.03 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Subject: [Edbrowse-dev] a bundle of changes for drescher and generally X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.21 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2016 03:01:11 -0000 The diedrescher.com website raises a lot of issues that extrapolate to all pages, so it has been a great place to start. Amazon, dkb.de, drescher, fastmail, google groups, they all tie back to the DOM. So now what I have ready is some edits that I am going to submit as a bundle. It's possible that they do not all belong in the live program but will be more of a living illustration of what's missing and a neat way of showing how little distance there is between us and some tangible outcomes. This is an acceptable way of working, right? It's like forking the code for a specific purpose. You can compile a bleeding-edge sandbox without worrying about anything, and it could lead to some if not all of the differences making it into edbrowse. The idea to work on diedrescher.com came from Sebastian reporting that there is a series of links which don't respond or lead anywhere. So the overall question is why not? What does the edbrowse side need which it isn't finding in order to tie the click to the handler? What is the site code and libraries expecting to exist that we haven't implemented? It turns out to be a few things in combination. Here is the illustration of "before". If you click one of the links, you'll get a message like: beginning of an edbrowse fragment ------------- * {KONTAKT} g label kontakt is not found the edbrowse fragment ends here ------------- However, with the edits applied, it successfully goes through the entire circuit! Here is a real run from "after": beginning of an edbrowse fragment ------------- b http://diedrescher.com 11907 276 1 {DE} | {EN} * {DRESCHER} * {SHOP} * {BIBLIOTHEK} * {KONTAKT} g label kontakt is not found rr lines 19 through 74 have been updated 19 KONTAKT Band DRESCHER RECORDS the edbrowse run fragment ends here ------------- It's a real proof of concept! It goes the whole way and back! I noticed that it didn't notify me that the lines changed, so I am missing something relating to how edbrowse knows to automatically report that lines changed. Another flaw is that I can only retrieve the contact/kontact pages. The other pages come back successfully from XHR, but they have their own wrinkles, some of which involves iframes. I decided that for this bunch of edits, I would comment out "iframe" from availableTags because we have more work on this pending, so for the moment I am ignoring them so they can't crash edbrowse altogether. It is kind of breathtaking how much stuff happens to get this result. It turns out to be a little bit of everything working together: the xhr code, the timeout code, the DOM implementation, the events code, and giving libraries what they expect. o First the page code is digested into memory. A lot of functions will become resident, and some code will actually run prior to the first moment where it is interactive. o jquery doles out event handlers to elements. In this scenario, some of this distribution is done based on the value of the 'class' attribute, entirely managed by the library. So this is very different than the easier style where the handler is out in the open in an onclick on a piece of html. o Edbrowse picks up that these handlers exist o So when I press 'g', edbrowse successfully found the handler on that anchor and ran it o It happens that the page author's line of code uses $.get, which is jquery's wrapper around XMLHttpRequest. The page author's call to $.get also includes a callback function which will run on success o Now jquery's xhr wrapper uses our XHR code under the hood, to do the HTTP activity and to report a '200 OK' or other status o startwindow calls fetchHTTP using the javascript conduit and returns the retrieved HTTP page back to jquery as a response. It reports '200 OK', which jquery's xhr code is expecting before it will run the callback function to run on success. (Maybe something else happens on failure). o jquery's xhr runs the callback, using setTimeout. So our setTimeout code comes into it and I have made it more lenient so that one argument is allowed o The callback code takes the retrieved html page and writes it to the innerHTML of a certain div o This triggers our i{ side effect and we have now made the complete circuit back to the edbrowse side! The new div contents go through rendering and there is a message that certain lines changed, which might or might not need a 'rr' to propagate. hooray! Patch will be in the next email. Kevin