From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kevin@carhart.net>
Received: from out.smtp-auth.no-ip.com (smtp-auth.no-ip.com [8.23.224.61])
 by hurricane.the-brannons.com (Postfix) with ESMTPS id 4AEC8795B7
 for <edbrowse-dev@lists.the-brannons.com>;
 Wed, 29 Jun 2016 20:01:11 -0700 (PDT)
X-No-IP: carhart.net@noip-smtp
X-Report-Spam-To: abuse@no-ip.com
Received: from carhart.net (unknown [99.52.200.227])
 (Authenticated sender: carhart.net@noip-smtp)
 by smtp-auth.no-ip.com (Postfix) with ESMTPA id 75B92401DDA
 for <edbrowse-dev@lists.the-brannons.com>;
 Wed, 29 Jun 2016 20:02:47 -0700 (PDT)
Received: from carhart.net (localhost [127.0.0.1])
 by carhart.net (8.13.8/8.13.8) with ESMTP id u5U32kl8018739
 for <edbrowse-dev@lists.the-brannons.com>; Wed, 29 Jun 2016 20:02:46 -0700
Received: from localhost (kevin@localhost)
 by carhart.net (8.13.8/8.13.8/Submit) with ESMTP id u5U32ksc018732
 for <edbrowse-dev@lists.the-brannons.com>; Wed, 29 Jun 2016 20:02:46 -0700
Date: Wed, 29 Jun 2016 20:02:46 -0700 (PDT)
From: Kevin Carhart <kevin@carhart.net>
To: edbrowse-dev@lists.the-brannons.com
Message-ID: <alpine.LRH.2.03.1606291859040.17154@carhart.net>
User-Agent: Alpine 2.03 (LRH 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
Subject: [Edbrowse-dev] a bundle of changes for drescher and generally
X-BeenThere: edbrowse-dev@lists.the-brannons.com
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: Edbrowse Development List <edbrowse-dev.lists.the-brannons.com>
List-Unsubscribe: <http://lists.the-brannons.com/mailman/options/edbrowse-dev>, 
 <mailto:edbrowse-dev-request@lists.the-brannons.com?subject=unsubscribe>
List-Archive: <http://lists.the-brannons.com/mailman/private/edbrowse-dev/>
List-Post: <mailto:edbrowse-dev@lists.the-brannons.com>
List-Help: <mailto:edbrowse-dev-request@lists.the-brannons.com?subject=help>
List-Subscribe: <http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev>, 
 <mailto:edbrowse-dev-request@lists.the-brannons.com?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2016 03:01:11 -0000


The diedrescher.com website raises a lot of issues that extrapolate to all 
pages, so it has been a great place to start.  Amazon, dkb.de, drescher, 
fastmail, google groups, they all tie back to the DOM.  So now what I have 
ready is some edits that I am going to submit as a bundle.  It's possible 
that they do not all belong in the live program but will be more of a 
living illustration of what's missing and a neat way of showing how little 
distance there is between us and some tangible outcomes.  This is an 
acceptable way of working, right?  It's like forking the code for a 
specific purpose.  You can compile a bleeding-edge sandbox without 
worrying about anything, and it could lead to some if not all of the 
differences making it into edbrowse.

The idea to work on diedrescher.com came from Sebastian reporting that 
there is a series of links which don't respond or lead anywhere.  So the 
overall question is why not?  What does the edbrowse side need which it 
isn't finding in order to tie the click to the handler?  What is the site 
code and libraries expecting to exist that we haven't implemented?  It 
turns out to be a few things in combination.

Here is the illustration of "before".  If you click one of the links, 
you'll get a message like:


beginning of an edbrowse fragment -------------
* {KONTAKT}
g
label kontakt is not found
the edbrowse fragment ends here -------------


However, with the edits applied, it successfully goes through the entire 
circuit!  Here is a real run from "after":


beginning of an edbrowse fragment -------------
b http://diedrescher.com
11907
276
1
{DE} | {EN}


* {DRESCHER}

* {SHOP}

* {BIBLIOTHEK}

* {KONTAKT}
g
label kontakt is not found
rr
lines 19 through 74 have been updated
19
KONTAKT


Band


DRESCHER RECORDS
the edbrowse run fragment ends here -------------


It's a real proof of concept!  It goes the whole way and back!

I noticed that it didn't notify me that the lines changed, so I am missing 
something relating to how edbrowse knows to automatically report that 
lines changed.   Another flaw is that I can only retrieve the 
contact/kontact pages.  The other pages come back successfully from XHR, 
but they have their own wrinkles, some of which involves iframes.  I 
decided that for this bunch of edits, I would comment out 
"iframe" from availableTags because we have more work on this pending, so 
for the moment I am ignoring them so they can't crash edbrowse altogether.

It is kind of breathtaking how much stuff happens to get this result.  It 
turns out to be a little bit of everything working together: the xhr code, 
the timeout code, the DOM implementation, the events code, and giving 
libraries what they expect.

o First the page code is digested into memory.  A lot of functions will 
become resident, and some code will actually run prior to the first moment 
where it is interactive.
o jquery doles out event handlers to elements.  In this scenario, some of 
this distribution is done based on the value of the 'class' attribute, 
entirely managed by the library.  So this is very different than the 
easier style where the handler is out in the open in an onclick on a piece 
of html.
o Edbrowse picks up that these handlers exist
o So when I press 'g', edbrowse successfully found the handler on that 
anchor and ran it
o It happens that the page author's line of code uses $.get, which is 
jquery's wrapper around XMLHttpRequest.  The page author's call to $.get 
also includes a callback function which will run on success
o Now jquery's xhr wrapper uses our XHR code under the hood, to do the 
HTTP activity and to report a '200 OK' or other status
o startwindow calls fetchHTTP using the javascript conduit and returns the 
retrieved HTTP page back to jquery as a response.  It reports '200 
OK', which jquery's xhr code is expecting before it will run the 
callback function to run on success.  (Maybe something else happens on 
failure).
o jquery's xhr runs the callback, using setTimeout.  So our setTimeout 
code comes into it and I have made it more lenient so that one argument is 
allowed
o The callback code takes the retrieved html page and writes it to the 
innerHTML of a certain div
o This triggers our i{ side effect and we have now made the complete 
circuit back to the edbrowse side!  The new div contents go through 
rendering and there is a message that certain lines changed, which might 
or might not need a 'rr' to propagate.

hooray!
Patch will be in the next email.
Kevin