edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] Frame AutoExpansion
@ 2017-08-05 22:40 Karl Dahlke
  2017-08-06 23:32 ` Kevin Carhart
  0 siblings, 1 reply; 3+ messages in thread
From: Karl Dahlke @ 2017-08-05 22:40 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 5106 bytes --]

After the release, I'd like to think about expanding frames automatically.
It's complicated so I'm asking your advice.

First, I don't want to do it at all.
Opening websites is slow already, how much slower if I automatically expand each frame before you can see the text?
And most of those frames I don't care about anyways, they are advertising or supplementary information.
I like the current paradigm, type exp if you want to expand a frame.
Most of them I never expand.
But ... as Kevin has discovered ... some of the acid tests, and perhaps some of the real world sites,
have javascript that assumes the frames are expanded.
They dip into the objects in those frames, and sometimes twiddle those objects.
So we have some choices here.

1. Don't expand the frames and just realize that some sites won't work properly.
Hopefully very few sites, hopefully it's just advertising or visual effects that don't run.
In other words do nothing, that's the easiest.

2. Expand each frame in the window.
I'd have to expand the frames first, then run the javascript for the main window, because the frames are suppose to be there.
As part of parsing html, aha, a <frame> tag, stop what you're doing and go expand that frame, then resume.
And that frame could contain another frame and so on.
All my html parsing and stacking has to be reentrant, and I'll bet it's not today.
So a bit of work, and again, it's gonna slow down edbrowse, which is already pretty slow.

2A. All the frames probably fetch the same javascript files as the original window, it would be nice if I could just pull them from cache.
I know we already do this, but I mean don't even issue the head command.
Surely the javascript isn't going to change in the quarter second since I last fetched it.
Maybe this should be a general mechanism,
if you're fetching js, and it's in cache, and you accessed it less than one minute ago, then skip the head request and just grab the file.

3. Only as we need it.
I like this but it's the hardest to do.
Don't expand anything at the start.
Instead of a contentDocument object, each frame has a contentDocument getter and setter.
The getter returns the raw object content$Document if it is there, but if not, then the frame has not been expanded.
Expand the frame, link its document object to content$Document, then return content$Document as though it was there all the time.
Sweet, but watch what has to happen.

Think of it as a 2 process model, because the messaging is really the same even in one process.
The 2 processes are client server.
edbrowse asks for foo.bar ... js returns the value of foo.bar.
edbrowse sets foo.bar = 7 ... js acknowledges.
edbrowse says to run this javascript ... js runs it and returns the result.
It's a very direct protocol.
To do what I'm talking about, we have to stand it on its head.

edbrowse: run this script
js: from inside the content$Document getter, oh my goodness, we have to expand this frame, send a frame expand message back to edbrowse.
edbrowse is waiting for an acknowledgement or a result from the script, and now it gets a command.
It has to pause what it is doing, pushing things onto a stack of some sort,
and expand the frame, as though the user had typed exp at the keyboard.
Then it tells JS the frame is expanded, pops everything off the stack, and waits for the result of the script that it asked js to run earlier,
now in the state it was in before.
Awkward enough, but even more awkward on the js side.
It is in the middle of the getter, and it has to pause everything, push it onto a stack or something,
and go back to the main message loop, because edbrowse is expanding another frame,
and edbrowse is going to send all sorts of js requests to do that.
Eventually edbrowse will send along a frame-is-done message and js pops everything, goes back to the contentDocument getter,
and returns content$document.
Wow!

There's another way that is easier.
Stay within the one process model.
Replace most of the messages with simple function calls.
I don't have to send a message to a js process, or virtual process, to find the value of foo.bar, and wait for the value to come back as a message, I can just call the native function get_property_string_nat(foo, "bar");
There we go.
When js wants to expand a new frame it doesn't have to send a message the wrong way down a one way street.
It can just call the edbrowse parse and render mechanism as a function,
which calls more js routines, as functions, and C provides the stack.
I don't have to invent state stacks and use setjmp and longjmp, which I've done before but it's like playing with fire, I can let C take care of it.
This assumes duktape is reentrant, which it probably is.
I'm guessing that by the way you pass the context pointer to every duktape function call.
So yes that's good, but it locks us into the one process model.
I'd keep the 2 process model around, but this feature simply wouldn't work in the 2 process world.

That gives you something to ponder over the next week or so.

Karl Dahlke

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Edbrowse-dev] Frame AutoExpansion
  2017-08-05 22:40 [Edbrowse-dev] Frame AutoExpansion Karl Dahlke
@ 2017-08-06 23:32 ` Kevin Carhart
  2017-08-07  5:53   ` Karl Dahlke
  0 siblings, 1 reply; 3+ messages in thread
From: Kevin Carhart @ 2017-08-06 23:32 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev



Congratulations on the new version release!


In determining what kind of iframe support is worth the work involved, I 
think I would add, consider malice, or if not malice, a concerted 
effort, an assertiveness on the part of developers to explicitly prevent 
an edbrowse from picking and choosing just the content and the site 
machinery without what the developers and accountants want this to be 
impregnably bundled up with.

Karl said,
> the text? And most of those frames I don't care about anyways, they are 
> advertising or supplementary information. I like the current paradigm,

> 1. Don't expand the frames and just realize that some sites won't work properly.
> Hopefully very few sites, hopefully it's just advertising or visual effects that don't run.
> In other words do nothing, that's the easiest.

It's a continuum ultimately, and I'm not arguing for doing more rather 
than less, especially if it creates a lot of work to follow through on, or 
if there is a speed hit, and so on.  However, consider stubbornness or 
bloodymindedness on the part of site authors, because they might be going 
out of their way to make advertising, supplementary information and visual 
effects (just tests for getting and setting certain numeric and string 
variables) compulsory not because they technically have to, but because 
they want to.

They want dough, which means they want to call a 
client environment "supported" or "not supported" based on how passive it 
is towards bundling the money-making bits with content in a way that can't 
be undone by clever command-line people. 
Features like iframe support, CSS support and the kinds of tests found 
in the "supports" section of jquery, are used a shorthand for whether or 
not they have you sufficiently over a barrel to let you in.  These things 
are used as a substitute for testing the user-agent.  They know we can 
call ourselves a Firefox, but they call our bluff by testing 200 CSS 
attributes in rapid succession and saying "if the series of return values 
!== [true, true, false, 23, 0] they must not be a Firefox.  They might 
be disaggregators, better reject them.  Either tell them to upgrade or 
accuse them of being a bot."  A supported client environment is a 
euphemism for how they get some assurances that they have control.

What does this mean for implementing iframes and CSS?  We might still 
decide to do less, or not do it at all, but know your adversary (if 
you'll excuse my adversarial stance) and this might be 
a clue towards how pervasive this technique might be and whether the 
potential payoff for the work is a lot or a little.

AND: If we make it past their gatekeepers, we're still an edbrowse!  They 
will have gotten their assurances that we're a passive client, and we can 
carry on with our detailed granular logging of every curl action, go and 
examine and read document.scripts using jdb, and all of the other things 
that are the opposite of a passive, graphically-oriented client.

K

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Edbrowse-dev] Frame AutoExpansion
  2017-08-06 23:32 ` Kevin Carhart
@ 2017-08-07  5:53   ` Karl Dahlke
  0 siblings, 0 replies; 3+ messages in thread
From: Karl Dahlke @ 2017-08-07  5:53 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]

Sure .. option 1 wasn't really an option, I just put it out there.
I think my last idea is quite doable, and has all the benefits of saving resources when the frames are not needed, and expanding them under the covers when they are.
It does push us towards a one process design, but I think other factors do as well.
Try as we might, it's nearly impossible to keep all the cookies in sync as two processes both fetch html pages from the same website.
98% of the time that won't matter, but when one fetch sets a cookie and the second fetch (in the other process) doesn't work properly unless that cookie is present, well, some of this we manage, like when js sets the cookie via document.cookie, but when cookies ride in on http headers through curl under the covers,
well as I say it's really hard to handle every case. This an other concerns go away with one process and one curl space, so we might be going that direction anyways.
I'm currently changing messages to simple function calls, at least for JS1, which will make it easier to do recursive and reentrant things.
This sounds easy but not quite: the messaging allows for some nice debugging because I can log each message back and forth,
without messages I have to put the same debug statements around various function calls.
And I do want the debug features as they are today, very useful.
It's not hard, just lots of details.

Karl Dahlke

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-08-07  5:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-05 22:40 [Edbrowse-dev] Frame AutoExpansion Karl Dahlke
2017-08-06 23:32 ` Kevin Carhart
2017-08-07  5:53   ` Karl Dahlke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).