[Edbrowse-dev] a JS centric design

edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed

* [Edbrowse-dev] a JS centric design
@ 2015-11-06 19:51 Karl Dahlke
  2015-11-07 16:13 ` Adam Thompson
  0 siblings, 1 reply; 5+ messages in thread
From: Karl Dahlke @ 2015-11-06 19:51 UTC (permalink / raw)
  To: Edbrowse-dev

So I was thinking, if I were to start all over again,
what might I do differently?
All those native methods and their side effects, very awkward,
and they don't do the right thing all the time,
and they are engine specific since they are in C,
so have to be modified if we upgrade or switch js engines,
and you still have to check a lot of js variables anyways because
js doesn't always call those functions to do its thing.
So ...

What if there was a lot more js at the start, more prototypes,
more functions, more js setters, and some specific edbrowse variables under
window.eb$, variables that we can query after js has run.
There would be almost no native methods,
only one that I can think of and I'll get to that below.

The real down side here is there are two ways to render text.
First the one we already have which is based on the html tree.
We have to keep that one because the user might run without js,
for any number of reasons.
The second, now, is to render the possibly modified js tree.
Start at document.body and traverse childNodes depth first.
This could build the text buffer directly, or, it could
build a tree of nodes whence we call the first render routine above.
It might be nice to leverage the preexisting render routine.
Such would happen after every javascript call, push the button,
modify a field with onchange code, make a different selection
with onselect code, submit a form with onsubmit code, you get the idea.
It looks a little painful relative to performance but in reality
I don't think it makes any difference.
You wouldn't run the update all that often, and web pages
aren't that huge, and computers are pretty fast.
Don't think performance matters, but the two different render routines,
that's the down side.

The up side is almost nothing is done through native methods and side effects
passed back to the html process, everything is gleaned after js returns,
and nothing is missed.
Nothing gets lost in translation.
The whole js tree, whatever it is, is rerendered.

Let's look at some other nonrendering side effects.
document.cookie could have an inbuilt setter
to add the new cookie to a list of cookies to be processed
when js returns,
window.eb$.cookieList[],
and the setter would also fold the new cookie into the cookie string
that is returned by getter when document.cookie is queried.
It's pretty easy, certainly easier than the native code we have today
jseng-moz.cpp line 955.
Perhaps not less code, but more maintainable.

Here's something that seems like it still has to be native.
document.location.href = "new web page".
js doesn't keep going, it stops and edbrowse has to fetch a new web page.
So whenever edbrowse has to take action, right now,
not later, not delayed,
not when js is finished but right now,
that's a candidate for a native method.
But here again, it doesn't have to be native, because js is suppose to stop.
So set document.location.href like you normally would,
set a jump flag in window.eb$.jumpNewLocation, and then throw an
exception so that js stops.
Same model, edbrowse checks everything after js returns.
It sees the jump flag and goes to a new web page.

Here's another one, document.forms[0].submit().
Run the onsubmit code first and if that's ok then set a jump flag
in window.eb$.jumpFormSubmit0, and throw an exception so js stops.
Still nothing is native.

The only thing I've found so far that really must be native is that pesky innerHTML.
It has to parse html and fold objects into the js tree now,
before the next line of js runs,
and, js does not stop, so we can't just throw an exception.
I'm not going to translate the entire tidy system into js,
so that will remain a C routine.
innerHTML has to be native, and run the text through tidy,
and our html-tidy.c or some variation thereof
to make js nodes, and paste them into the tree,
then it returns and js marches on.
That's how the native method works today,
and we could pretty much keep it as is.
But that's it.
Is it possible that if I were starting all over again
with a js centric design that there would be only one native method, innerHTML?

I'm not saying this is a better design, although part of me thinks it is,
since there is less engine specific code, and each time we render the
text buffer straight from the horse's mouth.
But I don't know.
And I'm certainly not planning any changes of this magnitude any time soon.
We need to march towards 3.6.0 and stability.
I just wanted to put this idea out there,
in case I get hit by a bus tomorrow or something.

Karl Dahlke

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Edbrowse-dev] a JS centric design
  2015-11-06 19:51 [Edbrowse-dev] a JS centric design Karl Dahlke
@ 2015-11-07 16:13 ` Adam Thompson
  2015-11-07 16:33   ` Karl Dahlke
  2015-11-07 22:23   ` Chris Brannon
  0 siblings, 2 replies; 5+ messages in thread
From: Adam Thompson @ 2015-11-07 16:13 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 7796 bytes --]

On Fri, Nov 06, 2015 at 02:51:14PM -0500, Karl Dahlke wrote:
> So I was thinking, if I were to start all over again,
> what might I do differently?
> All those native methods and their side effects, very awkward,
> and they don't do the right thing all the time,
> and they are engine specific since they are in C,
> so have to be modified if we upgrade or switch js engines,
> and you still have to check a lot of js variables anyways because
> js doesn't always call those functions to do its thing.
> So ...
>=20
> What if there was a lot more js at the start, more prototypes,
> more functions, more js setters, and some specific edbrowse variables und=
er
> window.eb$, variables that we can query after js has run.
> There would be almost no native methods,
> only one that I can think of and I'll get to that below.

Ok, I'm going to start my response with a big *NO* (but please read on for
my reasoning).

> The real down side here is there are two ways to render text.
> First the one we already have which is based on the html tree.
> We have to keep that one because the user might run without js,
> for any number of reasons.
> The second, now, is to render the possibly modified js tree.
> Start at document.body and traverse childNodes depth first.
> This could build the text buffer directly, or, it could
> build a tree of nodes whence we call the first render routine above.
> It might be nice to leverage the preexisting render routine.
> Such would happen after every javascript call, push the button,
> modify a field with onchange code, make a different selection
> with onselect code, submit a form with onsubmit code, you get the idea.
> It looks a little painful relative to performance but in reality
> I don't think it makes any difference.
> You wouldn't run the update all that often, and web pages
> aren't that huge, and computers are pretty fast.
> Don't think performance matters, but the two different render routines,
> that's the down side.

Ok, if you're talking about modern web pages performance could start to be a
serious issue with this, particularly when we get into async js.
At the moment it's not, and there are *so many* other issues (but a decreasing
amount) that performance isn't top of the list.
However there are performance issues which could be come more of a priority  as
we get an increasingly complete js implementation.

> The up side is almost nothing is done through native methods and side effects
> passed back to the html process, everything is gleaned after js returns,
> and nothing is missed.
> Nothing gets lost in translation.
> The whole js tree, whatever it is, is rerendered.

That's true, but I fear we're seeing shiny things and mistaking them for good
design (see below).

> Let's look at some other nonrendering side effects.
> document.cookie could have an inbuilt setter
> to add the new cookie to a list of cookies to be processed
> when js returns,
> window.eb$.cookieList[],
> and the setter would also fold the new cookie into the cookie string
> that is returned by getter when document.cookie is queried.
> It's pretty easy, certainly easier than the native code we have today
> jseng-moz.cpp line 955.
> Perhaps not less code, but more maintainable.

And both insecure and incredibly fragile against a malicious web page.
Someone could insert all sorts of crud in there,
or use some sort of compromise to insert a magic property in place of the array
to, for example (in a world where we have ajax)
capture all cookies set by a different website or similar (and I've not even
tried to think about this too hard).

> Here's something that seems like it still has to be native.
> document.location.href = "new web page".
> js doesn't keep going, it stops and edbrowse has to fetch a new web page.
> So whenever edbrowse has to take action, right now,
> not later, not delayed,
> not when js is finished but right now,
> that's a candidate for a native method.
> But here again, it doesn't have to be native, because js is suppose to stop.
> So set document.location.href like you normally would,
> set a jump flag in window.eb$.jumpNewLocation, and then throw an
> exception so that js stops.
> Same model, edbrowse checks everything after js returns.
> It sees the jump flag and goes to a new web page.

Unless someone uses an iframe to set this (again via a compromised site).
This is a window level property so an iframe etc could do all sorts of damage here.

> Here's another one, document.forms[0].submit().
> Run the onsubmit code first and if that's ok then set a jump flag
> in window.eb$.jumpFormSubmit0, and throw an exception so js stops.
> Still nothing is native.

Or compromise to bypass form validation, or capture the url somehow, or redirect the user to somewhere else.

> The only thing I've found so far that really must be native is that pesky  innerHTML.
> It has to parse html and fold objects into the js tree now,
> before the next line of js runs,
> and, js does not stop, so we can't just throw an exception.
> I'm not going to translate the entire tidy system into js,
> so that will remain a C routine.
> innerHTML has to be native, and run the text through tidy,
> and our html-tidy.c or some variation thereof
> to make js nodes, and paste them into the tree,
> then it returns and js marches on.
> That's how the native method works today,
> and we could pretty much keep it as is.
> But that's it.
> Is it possible that if I were starting all over again
> with a js centric design that there would be only one native method, innerHTML?

Perhaps, but you'd have a fragile, easily broken DOM with a bunch of
designed-in security holes. I'm not claiming I'm a cyber security expert,
but I do work in that industry and this design is setting alarm bells ringing for me.

There's a reason a chunk of DOM objects are read-only;
part of that is performance but I suspect most of it is to prevent web developers
breaking fundimental mechanisms as is possible with this design.

> I'm not saying this is a better design, although part of me thinks it is,
> since there is less engine specific code, and each time we render the
> text buffer straight from the horse's mouth.

That's, if anything, why we need a *more* native DOM,
but decoupled from the js engine, i.e.
create object stubs which go back to the DOM to set DOM attributes so we don't
need to make a bunch of js variable checks to render the DOM.

> But I don't know.
> And I'm certainly not planning any changes of this magnitude any time soon.
> We need to march towards 3.6.0 and stability.
> I just wanted to put this idea out there,
> in case I get hit by a bus tomorrow or something.

It's an interesting idea and seems, superficially, like a good one.
However, if I've learned anything about the internet it's that, in general,
browsers are the primary way of exploiting users' computers and thus it's
important that the attack surface is kept as small as possible.
That means reducing the amount of internal DOM implementation which web pages
can fiddle with, and probably means moving more of our code into nativeC,
or at least implementing tighter security aroundit.
Without wanting to sound too negative,
we're currently very fortunate that we don't have a more fully featured js
implementation since we simply don't have the security in place to do this well.
I'm not talking about DOM stuff, that's not that much of a problem and is a
 very good idea to sort out.
The real issues are going to happen when we get AJAX and the file system
accessing functionality present in really modern browsers' implementations 
(e.g. new versions of Firefox).

Cheers,
Adam.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Edbrowse-dev]  a JS centric design
  2015-11-07 16:13 ` Adam Thompson
@ 2015-11-07 16:33   ` Karl Dahlke
  2015-11-07 22:23   ` Chris Brannon
  1 sibling, 0 replies; 5+ messages in thread
From: Karl Dahlke @ 2015-11-07 16:33 UTC (permalink / raw)
  To: Edbrowse-dev

Adam, thanks for your thoughts and concerns.
I posted because I realy do want to know.
Would like to hear from others as well.

At the surface I don't see anything a web page could do in my js centered model,
e.g. sticking variables in window.eb$, that it couldn't already do straight away
by fiddling with document.cookie or document.location or document.forms[0].action
or any of those things, so it seems all the same to me,
but as you say, correctly, we really have to give this
a lot of thought before taking even a small step in that direction.
Have to be convinced it won't open up any new loopholes.

I know what you mean about browser being the main point of entry
for hackers, though now it might be phishing emails.
Ten years ago my wife's Explorer was hijacked,
and she was looking at my web site which I wrote, and seeing all sorts of
hyperlinks that weren't there, links that I didn't put in,
links her hijacked browser was creating out of thin air,
to direct her to other websites.
Twas one of my biggest WTF moments.
It made me sick and she's never been on windows since,
which solves most of the problem but yes I know what you mean.

Karl Dahlke

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Edbrowse-dev] a JS centric design
  2015-11-07 16:13 ` Adam Thompson
  2015-11-07 16:33   ` Karl Dahlke
@ 2015-11-07 22:23   ` Chris Brannon
  2015-11-07 22:35     ` Karl Dahlke
  1 sibling, 1 reply; 5+ messages in thread
From: Chris Brannon @ 2015-11-07 22:23 UTC (permalink / raw)
  To: Edbrowse-dev

Adam Thompson <arthompson1990@gmail.com> writes:

> And both insecure and incredibly fragile against a malicious web page.
> Someone could insert all sorts of crud in there,
> or use some sort of compromise to insert a magic property in place of the array
> to, for example (in a world where we have ajax)
> capture all cookies set by a different website or similar (and I've not even
> tried to think about this too hard).

Yes, the possibility of exposing internals to JavaScript seems too
fraught with danger, and it seems that the best (and only?) place for
them is in native code.

> That's, if anything, why we need a *more* native DOM,
> but decoupled from the js engine, i.e.
> create object stubs which go back to the DOM to set DOM attributes so we don't
> need to make a bunch of js variable checks to render the DOM.

Yes, the problem we had with native code was, first and foremost, that
it was fragile.  How many months did we spend porting from Spidermonkey
1.8.5 to Spidermonkey 24?  What happens when 24 gets end-of-lifed and
we're stuck scrambling to move to a new engine?  This is why moving as
much as possible out of native code is so darned attractive.
It's a very sweet siren's song.  But if we could come up with a native
DOM that wasn't tied to an engine, it would be even better.  So where do
we start?

> However, if I've learned anything about the internet it's that, in general,
> browsers are the primary way of exploiting users' computers

That's because the industry has been singing the "do it in the browser"
tune since the mid-90s, and now, a document delivery platform is being
used for everything under the sun, from word processing to online
banking.  It wasn't designed for most of these things, so we have this
horrible impedence mismatch.
Unfortunately, we can't change that.

-- Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Edbrowse-dev]  a JS centric design
  2015-11-07 22:23   ` Chris Brannon
@ 2015-11-07 22:35     ` Karl Dahlke
  0 siblings, 0 replies; 5+ messages in thread
From: Karl Dahlke @ 2015-11-07 22:35 UTC (permalink / raw)
  To: Edbrowse-dev

> if we could come up with a native DOM that wasn't tied to an engine,
> it would be even better. So where do we start?

Actually we've made more strides in this direction than you might realize.

Fold all the js engine specific code into one file, currently jseng-moz.cpp.
Make an api that talks to that file, currently implemented in ebjs.[ch].
Keep a lot of the DOM functionality on this side of that API, decorate.c html.c.

All that was done in the past year.
Remember when js engine specific code was *everywhere*?
Sprinkled all over edbrowse?
That's one reason the moz upgrade was so horrible.
Remember how many JS AutoCompartment calls we use to have?
Now there are two.
And when we gathered that code together into one file it was at least
twice as big as it is now, having moved much of it to the other side.
So there is reason to believe we're gradually making sense of this.

Karl Dahlke

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-11-07 22:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-06 19:51 [Edbrowse-dev] a JS centric design Karl Dahlke
2015-11-07 16:13 ` Adam Thompson
2015-11-07 16:33   ` Karl Dahlke
2015-11-07 22:23   ` Chris Brannon
2015-11-07 22:35     ` Karl Dahlke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).