edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] edbrowse-js back in the fold??
@ 2015-09-26 14:05 Karl Dahlke
  2015-09-28  4:42 ` Adam Thompson
  0 siblings, 1 reply; 6+ messages in thread
From: Karl Dahlke @ 2015-09-26 14:05 UTC (permalink / raw)
  To: Edbrowse-dev; +Cc: ubuntu

As per a discussion that has been taking place off line,
and really needs to move to this group,
js has to immediately, and within its innerHTML setter,
parse the new html text and add the new objects to the js tree,
while at the same time, or not long there after,
adding the tree of nodes to our tree for rendering.
Both processes now need tidy5, html-tidy.c,
and at least half of the logic in render.c.
With this new revelation,
how much easier would all this be if we hadn't separated edbrowse-js into another process!
As Fagin says in Oliver,
I think I better think it out again.

Don't get me wrong - encapsulating js into a separate entity of some kind,
with its own source file, and the mozilla details hidden in that source file,
and a communication api to and from the js layer,
was absolutely the right thing to do. Absolutely!
Thank you Adam for directing us down this path.
But we did the same for tidy without making another process.
Now if they were once again the same process,
possibly different threads of the same process,

1. One less hassle with the windows port, as threads are standard
and portable, and the spinning off of the process with pipes not so much.

2. js innerHTML and document.write can build js objects and add to our tree of nodes
immediately, in the setter, as is suppose to happen, and all in one go,
all at the same time.

3. No need to pass the html, or the resulting subtree,
back through the pipes and back to edbrowse for incorporation.

4. Better performance (a minor consideration).

5. All of edbrowse is once again a c++ program (a minor nuisance).

6. seg fault on the js side would once again bring down all of edbrowse.
This was one of our considerations,
but I would hope those seg faults are becoming infrequent, and I think they are.

If we really must keep them separate processes, could we use shared memory
so both can work on the one common tree of nodes?
Is shmget portable to windows?
Doesn't shmget require a fixed block of memory of a fixed size?
That's the way I remember it.
that's how the man page reads.
That wouldn't work well with our model;
I want to be able to dynamically grow the tree as big as the web page is,
without compile time constraints or even run time committment to a size,
as we have to do for instance with mozilla's js pool.
I mean we could set a pool size at run time for the trees of html nodes managed by edbrowse,
wouldn't be a show stopper, just not my first preference.

After the last flurry of work settles down and stabilizes,
and this has been all good stuff,
all moving us forward in the right direction,
but after that settles we need to discuss
and plan and design before making the next big change.
We either need to move some html / render functionality into both processes,
with subtree data coming back through pipes,
or combine things back into one edbrowse process,
or find a shared memory solution.


Karl Dahlke

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Edbrowse-dev] edbrowse-js back in the fold??
  2015-09-26 14:05 [Edbrowse-dev] edbrowse-js back in the fold?? Karl Dahlke
@ 2015-09-28  4:42 ` Adam Thompson
  2015-09-28 15:20   ` Chris Brannon
  0 siblings, 1 reply; 6+ messages in thread
From: Adam Thompson @ 2015-09-28  4:42 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev, ubuntu

[-- Attachment #1: Type: text/plain, Size: 5809 bytes --]

On Sat, Sep 26, 2015 at 10:05:15AM -0400, Karl Dahlke wrote:
> As per a discussion that has been taking place off line,
> and really needs to move to this group,
> js has to immediately, and within its innerHTML setter,
> parse the new html text and add the new objects to the js tree,
> while at the same time, or not long there after,
> adding the tree of nodes to our tree for rendering.
> Both processes now need tidy5, html-tidy.c,
> and at least half of the logic in render.c.

Yes, that would appear to be the case.

> With this new revelation,
> how much easier would all this be if we hadn't separated edbrowse-js into another process!
> As Fagin says in Oliver,
> I think I better think it out again.
> 
> Don't get me wrong - encapsulating js into a separate entity of some kind,
> with its own source file, and the mozilla details hidden in that source file,
> and a communication api to and from the js layer,
> was absolutely the right thing to do. Absolutely!
> Thank you Adam for directing us down this path.

Thanks, shame it's not quite working out as hoped... but see below.

> But we did the same for tidy without making another process.
> Now if they were once again the same process,
> possibly different threads of the same process,
> 
> 1. One less hassle with the windows port, as threads are standard
> and portable, and the spinning off of the process with pipes not so much.

Hmmm, see thoughts below re: the possibility of a portable (well ok external
library but portable from our side) way of making this work.

> 2. js innerHTML and document.write can build js objects and add to our tree of nodes
> immediately, in the setter, as is suppose to happen, and all in one go,
> all at the same time.

Agreed, this would work, but we'd run into issues when we make this stuff
asynchronous (that needs to happen soon).

> 3. No need to pass the html, or the resulting subtree,
> back through the pipes and back to edbrowse for incorporation.
The subtree looks difficult to do without some sort of intermediate
representation.

> 4. Better performance (a minor consideration).

Not necessarily. One of my issues with multi-threading and our current code
base is that there are simply entire chunks which aren't thread-safe and would
require all kinds of mutex hell to make run reliably.

> 5. All of edbrowse is once again a c++ program (a minor nuisance).

That assumes we stick with our already rather outdated spidermonkey version
(firefox is on... version 31 or probably more like 35 now,
not sure when they'll cut a new smjs release if ever).

> 6. seg fault on the js side would once again bring down all of edbrowse.
> This was one of our considerations,
> but I would hope those seg faults are becoming infrequent, and I think they are.

Not to mention the possibility of js-induced deadlocks etc in (for example)
html node tree access.

> If we really must keep them separate processes, could we use shared memory
> so both can work on the one common tree of nodes?
> Is shmget portable to windows?
> Doesn't shmget require a fixed block of memory of a fixed size?
> That's the way I remember it.
> that's how the man page reads.
> That wouldn't work well with our model;
> I want to be able to dynamically grow the tree as big as the web page is,
> without compile time constraints or even run time committment to a size,
> as we have to do for instance with mozilla's js pool.
> I mean we could set a pool size at run time for the trees of html nodes managed by edbrowse,
> wouldn't be a show stopper, just not my first preference.

I'm not sure about the portability of <shared_memory_api> but I'm not sure that's where we should go either.
I think, if I remember my original design correctly,
I was thinking more of having the DOM in a separate process,
may be even one per browser buffer. We went for just moving the js at the time
because we needed to encapsulate things and allow switching js engines,
but the more I learn about ajax the more I believe we really need
buffer-specific browser processes communicating back to the Edbrowse main ui somehow.
I've talked about this with someone at work actually and he agreed that for
something like our ui it makes sense to adopt a sort of "browser as a server"
model where the networking, html parsing etc is all handled in a server process
and then the interface does the rendering.

Now for the portability discussion (se above).
I was thinking, seeing as we need all sorts of networking,
asynchronous processing etc, whether it'd make sense to look at using a library to do this.
In particular I was thinking of libuv as I think (from memory)
it has a Windows port.

> After the last flurry of work settles down and stabilizes,
> and this has been all good stuff,
> all moving us forward in the right direction,
> but after that settles we need to discuss
> and plan and design before making the next big change.
> We either need to move some html / render functionality into both processes,
> with subtree data coming back through pipes,
> or combine things back into one edbrowse process,
> or find a shared memory solution.

Or head down the above route. I'd also throw out there
that we have web sockets becoming a
progressively larger "thing" in web development,
so my original plan of just using libcurl to implement an XMLHttpRequest object
just won't fly any more. Js now expects not only full networking,
but also (apparently) the ability to read files from the user's computer (see the
FileReader object on the MDN site for example).
I believe gmail actually uses such an object in the non-basic file upload
mechanism (it's certainly not a standard multi-part form upload).

Cheers,
Adam.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Edbrowse-dev] edbrowse-js back in the fold??
  2015-09-28  4:42 ` Adam Thompson
@ 2015-09-28 15:20   ` Chris Brannon
  2015-09-28 17:28     ` Karl Dahlke
  2015-09-29  7:25     ` Adam Thompson
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Brannon @ 2015-09-28 15:20 UTC (permalink / raw)
  To: Adam Thompson; +Cc: Karl Dahlke, ubuntu, Edbrowse-dev

Adam Thompson <arthompson1990@gmail.com> writes:

>> 5. All of edbrowse is once again a c++ program (a minor nuisance).
>
> That assumes we stick with our already rather outdated spidermonkey version

Any progress with looking into duktape?
Would you like me to have a go at it?

> I'm not sure about the portability of <shared_memory_api> but I'm not sure that's where we should go either.
> I think, if I remember my original design correctly,
> I was thinking more of having the DOM in a separate process,
> may be even one per browser buffer. We went for just moving the js at the time
> because we needed to encapsulate things and allow switching js engines,

Yes, this is also how I remember that discussion.

> I was thinking, seeing as we need all sorts of networking,
> asynchronous processing etc, whether it'd make sense to look at using a library to do this.

So how would that look, exactly?

> Or head down the above route. I'd also throw out there
> that we have web sockets becoming a
> progressively larger "thing" in web development,

And we also have stuff like HTTP/2 server push coming up on us.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Edbrowse-dev]  edbrowse-js back in the fold??
  2015-09-28 15:20   ` Chris Brannon
@ 2015-09-28 17:28     ` Karl Dahlke
  2015-09-29  7:25     ` Adam Thompson
  1 sibling, 0 replies; 6+ messages in thread
From: Karl Dahlke @ 2015-09-28 17:28 UTC (permalink / raw)
  To: Edbrowse-dev

A quick note regarding our arrangement of software into processes:

Having performed a rethink,
and having wrestled with some of the things we need to implement,
I'm convinced we need to stay the course, edbrowse and edbrowse-js.
It allows for some of the asynchronicity we'll want in the future,
edbrowse-js failure or lockup does not bring down edbrowse,
edbrowse can meaningfully run and provide many features even if an
individual or distributer cannot build edbrowse-js,
and most important, different copies of opaquely the same functions,
like get_property_string(object, membername) for instance,
must do two different things in the two processes.
There are dozens of such functions, most of the layer in ebjs.c.
This is driven by parsing, rendering, and possibly decorating html in edbrowse,
assuming there is no js or js is broken or unbuilt,
and also in the js world while the script is running, under a setter.
That html must parse and process in these two different contexts
drives much of the above, so I press on.
Both processes now contain the tidy machinery,
our transformation of the tidy tree into our tree,
some prerendering of the tree, and the decoration of that tree
with js objects, which is done by remote calls from edbrowse
and by native calls in edbrowse-js.
But the code in decorate.c is the same either way.
That is the magic that I have set up.
Geoff, do I have to change cmake files as the .o dependencies change?
Example, both processes now share html-tidy.o and decorate.o.
I really need to learn cmake.

What frightened me initially, passing back the created subtree
from edbrowse-js back to edbrowse, is not so frightening,
as I thought of a clever way to do it last night.
I don't have to pack up and represent the tree in some long confusing ascii
string and pass it back and unpack it again, I can do something else.

Keeping a common tree in shared memory is not practical as:
shmget is not portable to windows, and, more important,
it's not just a tree of nodes but each node may have allocated strings
hanging off of it, for the tag attributes etc,
and I just can't tell malloc to use, in certain situations,
a block of shared memory for its pool.
It would all be a nightmare, and there's really a better way.

So I don't plan to rock the boat in any significant way.
We have a good start and can keep going forward in reasonable steps.
Geoff says our processes and interprocess communication are portable,
with a little work, so that's good.

Karl Dahlke

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Edbrowse-dev] edbrowse-js back in the fold??
  2015-09-28 15:20   ` Chris Brannon
  2015-09-28 17:28     ` Karl Dahlke
@ 2015-09-29  7:25     ` Adam Thompson
  2015-09-29  8:16       ` Karl Dahlke
  1 sibling, 1 reply; 6+ messages in thread
From: Adam Thompson @ 2015-09-29  7:25 UTC (permalink / raw)
  To: Chris Brannon; +Cc: Karl Dahlke, ubuntu, Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2333 bytes --]

On Mon, Sep 28, 2015 at 08:20:18AM -0700, Chris Brannon wrote:
> Adam Thompson <arthompson1990@gmail.com> writes:
> 
> >> 5. All of edbrowse is once again a c++ program (a minor nuisance).
> >
> > That assumes we stick with our already rather outdated spidermonkey version
> 
> Any progress with looking into duktape?
> Would you like me to have a go at it?

I'm hoping to make some time for it Friday evening and hopefully Sunday,
but if you want to look at it as well that'd certainly help.
Unfortunately work and life are conspiring to eat all of my development time.
I figure if we both keep pushing changes and talking then we can minimise merge
issues and get this thing done a lot quicker.

> > I'm not sure about the portability of <shared_memory_api> but I'm not sure that's where we should go either.
> > I think, if I remember my original design correctly,
> > I was thinking more of having the DOM in a separate process,
> > may be even one per browser buffer. We went for just moving the js at the time
> > because we needed to encapsulate things and allow switching js engines,
> 
> Yes, this is also how I remember that discussion.
> 
> > I was thinking, seeing as we need all sorts of networking,
> > asynchronous processing etc, whether it'd make sense to look at using a library to do this.
> 
> So how would that look, exactly?

Ok, I've not studied libuv's api in detail but I was thinking that we really
need to have the back-end being a server-type process which handles the
networking etc independant of the interface.
We then have the interface sat on top with the rendering code (note not the
html tidy code) which, either on a request from the user or some UI event
requests the current DOM for the current buffer from the browser process.
The server would handle spinning up other processes (may be per buffer, or, I'm not sure).
It'd also handle network fetch requests (i.e.
e http://some-url.com) since that'd allow all the weird and wonderful
networking that servers are starting to use.
There are many potential issues here,
not least of which is passing the node tree back to the edbrowse client process
in a relatively performant way, though I think passing user changes to the
server is relatively well understood.
Any thoughts on this?

Cheers,
Adam.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Edbrowse-dev]  edbrowse-js back in the fold??
  2015-09-29  7:25     ` Adam Thompson
@ 2015-09-29  8:16       ` Karl Dahlke
  0 siblings, 0 replies; 6+ messages in thread
From: Karl Dahlke @ 2015-09-29  8:16 UTC (permalink / raw)
  To: Edbrowse-dev

> Any progress with looking into duktape?

As per duktape, you may want to simply "look around" / research,
rather than diving in with both feet and translating jseng-moz.cpp.
I say this because that process is changing a lot, more than I thought.
I've already discussed the reformulation of innerHTML and document.write,
with new hooks to tidy parsing and tree formulation and tree decoration,
but beyond this there are more native functions that I first thought.
This is what Adam predicted quite some time ago.
I mean there aren't tons of them, but more than I thought.
So it is in a bit of flux at the moment; it should settle down
to stability again in a month or so.

Karl Dahlke

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-09-29  8:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-26 14:05 [Edbrowse-dev] edbrowse-js back in the fold?? Karl Dahlke
2015-09-28  4:42 ` Adam Thompson
2015-09-28 15:20   ` Chris Brannon
2015-09-28 17:28     ` Karl Dahlke
2015-09-29  7:25     ` Adam Thompson
2015-09-29  8:16       ` Karl Dahlke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).