[Edbrowse-dev] One program Two processes

edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed

* [Edbrowse-dev] One program Two processes
@ 2015-12-23 15:09 Karl Dahlke
  2015-12-23 18:45 ` Adam Thompson
  0 siblings, 1 reply; 16+ messages in thread
From: Karl Dahlke @ 2015-12-23 15:09 UTC (permalink / raw)
  To: Edbrowse-dev

this is an incremental step.
Moving towardes a merged solution with one instance of curl,
one set of cookies etc,
there is now just one executable, edbrowse,
which runs as either edbrowse or edbrowse-js depending on its name.
Like gawk and awk in /bin
So still two processes, same interface, same pipes
and messages, same communication, but one target
that is linked to both names.
Eventually edbrowse will spin off js as a thread,
not a fork exec of the same program under a different name,
and then we won't have to worry about replicating or losing cookies etc.
Will also smooth out the differences between linux and windows,
which are considerable any time you say the words fork exec.

Chris I have not touched CMakeLists, could you modify it
to behave the way the makefile does, i.e. no edbrowse.lib
and cp edbrowse edbrowse-js
We have to copy in windows I suppose, no symbolic link,
but this is all temporary anyways, as we eventually won't need edbrowse-js,
so no matter.

Karl Dahlke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Edbrowse-dev] One program Two processes
  2015-12-23 15:09 [Edbrowse-dev] One program Two processes Karl Dahlke
@ 2015-12-23 18:45 ` Adam Thompson
  2015-12-23 19:07   ` Karl Dahlke
  2015-12-23 19:59   ` Chris Brannon
  0 siblings, 2 replies; 16+ messages in thread
From: Adam Thompson @ 2015-12-23 18:45 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1743 bytes --]

On Wed, Dec 23, 2015 at 10:09:28AM -0500, Karl Dahlke wrote:
> this is an incremental step.
> Moving towardes a merged solution with one instance of curl,
> one set of cookies etc,
> there is now just one executable, edbrowse,
> which runs as either edbrowse or edbrowse-js depending on its name.
> Like gawk and awk in /bin
> So still two processes, same interface, same pipes
> and messages, same communication, but one target
> that is linked to both names.
> Eventually edbrowse will spin off js as a thread,
> not a fork exec of the same program under a different name,
> and then we won't have to worry about replicating or losing cookies etc.
> Will also smooth out the differences between linux and windows,
> which are considerable any time you say the words fork exec.

My initial reaction to this was... very worried,
certainly about the suggested "thread safe" design.
I've seen projects which do this kind of half-done multi-threading and they
rarely end well. However, I'd personally be in favour of a hybrid design.
A sort of multi-process multi-threaded mechanism.

that being said, we can certainly do better than a fork exec with the copy and
symlink approach. There's no need now to have the executables separated like
that if we're comfortable linking the functions in the js engine,
simply fork once we've set things up and we get the structures etc for free.
As for your assertion about stability,
I'm now regularly seeing edbrowse-js seg faults again, not sure why,
although it looks like the process is using a lot of resources when it does this.

I'd rather we had a proper discussion about this stuff before heading off in a
direction like this, particularly as I was planning on doing some xhr stuff.

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Edbrowse-dev]  One program Two processes
  2015-12-23 18:45 ` Adam Thompson
@ 2015-12-23 19:07   ` Karl Dahlke
  2015-12-23 19:59   ` Chris Brannon
  1 sibling, 0 replies; 16+ messages in thread
From: Karl Dahlke @ 2015-12-23 19:07 UTC (permalink / raw)
  To: Edbrowse-dev

> certainly about the suggested "thread safe" design.

The two threads would run separately in time, logically,
because the protocol is the same,
one thread is always waiting for a message from the other.
Just as the two processes are essentially serialized.
They never really run in parallel.
Both threads will not run httpConnect at the same time,
for example, it just won't happen.
So I'm not too worried about threadsafe nightmares.

Still, we can keep the processes separate for a while
as we think through the various implications. No problem.
Only step so far is making one image,
which gives us, when making xhr or whatever,
all the power of edbrowse.
Call httpConnect and you'll have the network of proxies and certificates,
the cookies in the cookie jar, etc.

> I'm now regularly seeing edbrowse-js seg faults again,

Well if you have a website ...
would love to track this down.

Karl Dahlke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Edbrowse-dev] One program Two processes
  2015-12-23 18:45 ` Adam Thompson
  2015-12-23 19:07   ` Karl Dahlke
@ 2015-12-23 19:59   ` Chris Brannon
  2015-12-23 20:44     ` Karl Dahlke
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Brannon @ 2015-12-23 19:59 UTC (permalink / raw)
  To: edbrowse-dev

Adam Thompson <arthompson1990@gmail.com> writes:

> A sort of multi-process multi-threaded mechanism.

The biggest concern is that we have to share a whole bunch of
HTTP-related state across process boundaries.
Multi-threading sounds like the answer.  Take that state that needs to
be shared across threads, and encapsulate it behind a well-defined
interface to insure proper synchronization, consistency, etc.
We will have to be much more careful about the use of anything global.
Another solution might be to wrap all of our networking code in its own
process.  Let the edbrowse, edbrowse-js, and other processes communicate
with it using an API based around messages exchanged over pipes.  Again,
this insures proper and synchronized access to this huge blob of global
state.  Or how about allowing edbrowse-js to call into the main edbrowse
process to make HTTP requests?
Does js really need to have direct, unmediated access to HTTP, or could
it send messages over its pipe to the parent process saying "GET <url>",
"POST <url> <data>", and so forth?

-- Chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Edbrowse-dev]  One program Two processes
  2015-12-23 19:59   ` Chris Brannon
@ 2015-12-23 20:44     ` Karl Dahlke
  2015-12-24 11:19       ` Adam Thompson
  0 siblings, 1 reply; 16+ messages in thread
From: Karl Dahlke @ 2015-12-23 20:44 UTC (permalink / raw)
  To: edbrowse-dev

> The biggest concern is that we have to share a whole bunch of
> HTTP-related state across process boundaries.
> Multi-threading sounds like the answer.

Yes, and the existing message serializing protocol
should prevent the kind of threading nightmares that we've
all experienced in the past.
One thread runs while the other waits.

> wrap all of our networking code in its own process.

Pipes everywhere, looks like a mess,
and where does it end, how many more processes?
Remember that IPC is limited on windows.
I think this creates entropy, whereas one process would probably work just fine.

> allowing edbrowse-js to call into the main edbrowse
> process to make HTTP requests?

We thought about this one before,
when we first learned js had to parse html and create corresponding js objects
right now, as a native method, before the next line of js ran.
Those objects have to be there.
So native code would have to stop and send a message back to edbrowse,
but it's waiting for the return from running the native code,
it's not waiting for a message that says
"hey hold up and here's some html and parse it
and then call me reentrantly so I can create those javascript objects
and then we unwind the stack together and hope that the reentrant calls into js
have not disturbed the thread of execution that was running,
which I can't guarantee for edbrowse or whatever
js engine we're using this week or next week."
Anyways it was way to convoluted, and much easier
to make sure both processes had access to tidy and could do that work.
In the same way, both processes, or both threads,
as you prefer, now have access to httpConnect for making that call.

> Does js really need to have direct, unmediated access to HTTP,

It needs to make the http call before the next line of js code runs.
A native method that does not disturb the flow of execution.

Karl Dahlke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Edbrowse-dev] One program Two processes
  2015-12-23 20:44     ` Karl Dahlke
@ 2015-12-24 11:19       ` Adam Thompson
  2015-12-24 13:15         ` Karl Dahlke
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Thompson @ 2015-12-24 11:19 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 3650 bytes --]

On Wed, Dec 23, 2015 at 03:44:50PM -0500, Karl Dahlke wrote:
> > The biggest concern is that we have to share a whole bunch of
> > HTTP-related state across process boundaries.
> > Multi-threading sounds like the answer.
> 
> Yes, and the existing message serializing protocol
> should prevent the kind of threading nightmares that we've
> all experienced in the past.
> One thread runs while the other waits.

Not necessarily, there are lots of globals and static buffers in the current code base, any one of which is a potential threading issue. Also, we're going to *have to* make js run asynchronously one of these days and then all the assumptions about the synchronising nature of the protocol will go out the window. If we're multi-threading based on those that's going to make a difficult task even harder. In addition such segfaults are horrible to debug usually since they tend to be difficult to reproduce (i.e. due to timing issues etc).

> > wrap all of our networking code in its own process.
> 
> Pipes everywhere, looks like a mess,
> and where does it end, how many more processes?
> Remember that IPC is limited on windows.
> I think this creates entropy, whereas one process would probably work just fine.

No because we're going to have to pull comms out anyway eventually in order to
get async networking which is increasingly required.
Managing this by encapsulating them into a (possibly completely thread-safe and thus multi-threaded) comms manager makes a lot of sense in this case.

> > allowing edbrowse-js to call into the main edbrowse
> > process to make HTTP requests?
> 
> We thought about this one before,
> when we first learned js had to parse html and create corresponding js objects
> right now, as a native method, before the next line of js ran.
> Those objects have to be there.
> So native code would have to stop and send a message back to edbrowse,
> but it's waiting for the return from running the native code,
> it's not waiting for a message that says
> "hey hold up and here's some html and parse it
> and then call me reentrantly so I can create those javascript objects
> and then we unwind the stack together and hope that the reentrant calls into js
> have not disturbed the thread of execution that was running,
> which I can't guarantee for edbrowse or whatever
> js engine we're using this week or next week."
> Anyways it was way to convoluted, and much easier
> to make sure both processes had access to tidy and could do that work.
> In the same way, both processes, or both threads,
> as you prefer, now have access to httpConnect for making that call.

Actually, although this was a quick fix,
I think that maintaining two concurrent DOMs is only sustainable to a point.
Since the interface is going to have to become more asynchronous anyway,
this should not be so much of an issue in future.

> > Does js really need to have direct, unmediated access to HTTP,
> 
> It needs to make the http call before the next line of js code runs.
> A native method that does not disturb the flow of execution.

Actually that's incorrect. as far as I know,
although synchronous calls are implemented on top of it,
AJAX is fundimentally asynchronous, event driven, networking.
That's what the state variables are used for. Hence the Asynchronous part.
If we keep designing as if the world of the internet is synchronous then fixing
our design to be asynchronous is going to become increasingly difficult.

Regards,
Adam.
PS: I'm also going to look into portable IPC options to replace the pipes everywhere mechanism we're currently using.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Edbrowse-dev]   One program Two processes
  2015-12-24 11:19       ` Adam Thompson
@ 2015-12-24 13:15         ` Karl Dahlke
  2015-12-24 18:39           ` Adam Thompson
  0 siblings, 1 reply; 16+ messages in thread
From: Karl Dahlke @ 2015-12-24 13:15 UTC (permalink / raw)
  To: edbrowse-dev

> Also, we're going to *have to* make js run asynchronously one of these days

Well I hope we don't, in the most literal sense of "asynchronous",
Because it's 10 times harder.
Here's what I mean.

Let http requests run in the background,
truly in the background, in parallel with everything else,
because these are the only tasks that might take more than a millisecond.
We already do this with our background downloads, and it is truly asynchronous.
When a request is complete, from xhr for example,
then a piece of js could run, serialized with other pieces of js
and with edbrowse itself.
Separate instances of js never run together, and how could they?
Think AutoCompartment in mozjs 24, which puts you in a specific context.
So js from two different windows can't really run in parallel, it all has to be serialized,
this chunk running in this window in this context in this compartment,
now that chunk in that compartment etc.
It has to be a timeshare system,
which is better for us anyways, easier to design,
ammenable to messages and pipes,
and *much* more threadsafe if we go with threads in a process.

Timers work this way now.
It looks like they're completely asynchronous, and they work just like
they're suppose to, but in reality it's timeshared
to keep the code simple and the interactions down to a minimum.
A snippet of js runs under a timer, or some js code runs on behalf
of edbrowse, or edbrowse itself is editing or formatting or posting
or whatever. All serialized.
The big test is
http://www.eklhad.net/async
run three of these in parallel and edit something in buffer 4.
Timers queue up with the work you are doing but it all serializes,
so when a timer fires in buffer 1, js runs in context 1,
including AutoCompartment on the js side, and then it finishes and something else happens,
and the buffers and js and doms don't get mixed up,
and they don't get confused with the work you are doing in buffer 4 in the foreground,
and this was a fair piece of work to get right
and it's not trivial, but now that it's timeshared
I'm pretty confident the windows and contexts won't collide.
Oh another test which I have done is to run async as above
but in the same buffer edit or browse something else on top of it,
which pushes async down on the stack.
But it's still there and its timer still runs
and when you use your back key to go back
it has the right time on the display.

This timeshare approach keeps things manageable, and I hope we are able to
continue with it.
It also makes the one process design at least feasible,
which we can continue to bat back and forth.

As for xhr I would probably create a mock timer that fires 5 times a second
and does nothing if the background http is stillin progress,
but if it's done then the hxr js snippet runs, under the timer,
and the timer deletes and goes away.
It keeps everything timeshared as above.
At first this design seems awkward and contrived,
but it might be a bit like democracy, the worst,
except for all the others.

If we do retain our two processes, or even more,
it would be easier if they were the same image, as they are now, and they just
do different things.
We tried to keep edbrowse-js lean and mean but found that it needed 30%, 50%, 60% of edbrowse,
so now it's all there,
with a global variable that reminds us we're really the js process.
cmake under windows really works better if each sourcefile is mentioned once,
as Geoff explained to me, so putting things together in libraries,
no problem for two processes but becomes combinatorially difficult for more,
so again one image is easy.
Only down side is you can't even build the basic edbrowse without the js library.
That is a downer,
but fortunately Chris maintains statics so people can run the program
even if they have trouble building it for any reason.
And we don't have to make our libraries shared or dll,
one image on disk and in memory, just running different ways.

> I think that maintaining two concurrent DOMs is only sustainable to a point.

I think we'll have to forever, in that we want edbrowse to browse even
if there is no js, yet when js does run it mucks with the tree
directly with native methods and that tree changes line by line,
moment by moment, before we can resync with edbrowse.
Anyways it's not easy no matter how you slice it.

Karl Dahlke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Edbrowse-dev] One program Two processes
  2015-12-24 13:15         ` Karl Dahlke
@ 2015-12-24 18:39           ` Adam Thompson
  2015-12-25  2:29             ` Karl Dahlke
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Thompson @ 2015-12-24 18:39 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 7068 bytes --]

On Thu, Dec 24, 2015 at 08:15:38AM -0500, Karl Dahlke wrote:
> > Also, we're going to *have to* make js run asynchronously one of these days
> 
> Well I hope we don't, in the most literal sense of "asynchronous",
> Because it's 10 times harder.

I agree, but yes, we will need this eventually I think.

> Here's what I mean.
> 
> Let http requests run in the background,
> truly in the background, in parallel with everything else,
> because these are the only tasks that might take more than a millisecond.

Not necessarily, there are js confirmation prompts which could block and all
sorts of other things. For example encryption in js (yes this is actually a
thing) and many more examples.

> We already do this with our background downloads, and it is truly asynchronous.
> When a request is complete, from xhr for example,
> then a piece of js could run, serialized with other pieces of js
> and with edbrowse itself.
> Separate instances of js never run together, and how could they?
> Think AutoCompartment in mozjs 24, which puts you in a specific context.
> So js from two different windows can't really run in parallel, it all has to be serialized,
> this chunk running in this window in this context in this compartment,
> now that chunk in that compartment etc.
> It has to be a timeshare system,
> which is better for us anyways, easier to design,
> ammenable to messages and pipes,
> and *much* more threadsafe if we go with threads in a process.

Unfortunately that will start to have problems with ajax etc so we really will
need asynchronous in the truest sense of the word and fairly soon now.
This is why I've been pushing for 1 buffer 1 process (well 2 processes one for
js and one for edbrowse) as a design.
In that case we have serialisable time sharing etc, async windows,
and if we do it right, an interface inwhich one buffer can block whilst you go
on doing other things in other buffers.
It's going to be a lot of work and we'll need a message passing interface which
is a bit more than pipes, but it can be done.

> Timers work this way now.
> It looks like they're completely asynchronous, and they work just like
> they're suppose to, but in reality it's timeshared
> to keep the code simple and the interactions down to a minimum.
> A snippet of js runs under a timer, or some js code runs on behalf
> of edbrowse, or edbrowse itself is editing or formatting or posting
> or whatever. All serialized.
> The big test is
> http://www.eklhad.net/async
> run three of these in parallel and edit something in buffer 4.
> Timers queue up with the work you are doing but it all serializes,
> so when a timer fires in buffer 1, js runs in context 1,
> including AutoCompartment on the js side, and then it finishes and something else happens,
> and the buffers and js and doms don't get mixed up,
> and they don't get confused with the work you are doing in buffer 4 in the foreground,
> and this was a fair piece of work to get right
> and it's not trivial, but now that it's timeshared
> I'm pretty confident the windows and contexts won't collide.
> Oh another test which I have done is to run async as above
> but in the same buffer edit or browse something else on top of it,
> which pushes async down on the stack.
> But it's still there and its timer still runs
> and when you use your back key to go back
> it has the right time on the display.

That's cool, but what if async starts taking a long time,
like some sort of exponential algorithm or something?

> This timeshare approach keeps things manageable, and I hope we are able to
> continue with it.
> It also makes the one process design at least feasible,
> which we can continue to bat back and forth.

I agree time sharing in a single thread is a very nice design where it works,
indeed I've a lot of experience working with such a design.
However it has certain limitations which is where true multi-threading and/or
multi-processing comes in for parallelisation.
Unfortunately, one can't guarantee that js will execute in a "short"
amount of time hence the need for some form of truely parallel design.

> As for xhr I would probably create a mock timer that fires 5 times a second
> and does nothing if the background http is stillin progress,
> but if it's done then the hxr js snippet runs, under the timer,
> and the timer deletes and goes away.
> It keeps everything timeshared as above.
> At first this design seems awkward and contrived,
> but it might be a bit like democracy, the worst,
> except for all the others.

That has some merits actually, though I'd do it out of the js world somehow.

> If we do retain our two processes, or even more,
> it would be easier if they were the same image, as they are now, and they just
> do different things.
> We tried to keep edbrowse-js lean and mean but found that it needed 30%, 50%, 60% of edbrowse,
> so now it's all there,
> with a global variable that reminds us we're really the js process.
> cmake under windows really works better if each sourcefile is mentioned once,
> as Geoff explained to me, so putting things together in libraries,
> no problem for two processes but becomes combinatorially difficult for more,
> so again one image is easy.
> Only down side is you can't even build the basic edbrowse without the js library.
> That is a downer,
> but fortunately Chris maintains statics so people can run the program
> even if they have trouble building it for any reason.
> And we don't have to make our libraries shared or dll,
> one image on disk and in memory, just running different ways.

I agree with this, it also makes things easier from a packaging and general
sanity point of view. Of course, if we want to get fancier in the future we
could support launching edbrowse and it plugging into the (for example)
comms manager of an existing edbrowse running under the same user.
that'd fix the parallel instances in different consoles issue.

> > I think that maintaining two concurrent DOMs is only sustainable to a point.
> 
> I think we'll have to forever, in that we want edbrowse to browse even
> if there is no js, yet when js does run it mucks with the tree
> directly with native methods and that tree changes line by line,
> moment by moment, before we can resync with edbrowse.
> Anyways it's not easy no matter how you slice it.

I agree, but I'd like to move toward something where js communicates what it
wants to do to the DOM or any html it wants parsed back to edbrowse and for
edbrowse to then inform js of the DOM as and when it needs to know.
What this basically means is have all the js DOM operations as wrappers which
actually query the main edbrowse process for the state of the DOM when they're performed.
This also gets us closer to the DOM idea of certain things being truely
read-only and host objects etc.
Of course there'll be performance impacts, but lets get things standards compliant and working then optimise later.

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Edbrowse-dev]    One program Two processes
  2015-12-24 18:39           ` Adam Thompson
@ 2015-12-25  2:29             ` Karl Dahlke
  2015-12-25 23:18               ` Adam Thompson
  0 siblings, 1 reply; 16+ messages in thread
From: Karl Dahlke @ 2015-12-25  2:29 UTC (permalink / raw)
  To: edbrowse-dev

Well as we consider various architectures
I always keep in the back of my head that we're a couple of volunteers in our spare time.
(I wonder how many fulltime people work on Chrome, or Edge.)
We may not have the resources to do it the best way.
In fact this year has been an anomaly, with Geoff and Kevin joining us,
and really making good progress.

> That's cool, but what if async starts taking a long time,
> like some sort of exponential algorithm or something?

Then it takes a long time.
It shouldn't, unless the js page is badly written, but if it does, then so be it.
I'm not trying to be glib, just saying we don't have the time or resources
to build a preemtive time slicing operating system in edbrowse,
that can context switch in and out of js sessions,
in an engine independent fashion, or even within the constraints of a fixed engine -
I think we're just not going to have the time for that.
Pieces of js run under the ospices of the engine,
without preemption, in their particular context / window,
and we just hope they are sane.
On a modern computer, a js computation would have to be insane
to last more than a millisecond, and sure it could happen,
and if it does edbrowse might not react as well as Chrome,
but sometimes we're trying to get it functional,
without necessarily covering all the corner cases.

P.S. If my finances continue to nosedive I might have to return to the work force,
and that's one less edbrowse developer.
Let's hope that doesn't happen.

Karl Dahlke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Edbrowse-dev] One program Two processes
  2015-12-25  2:29             ` Karl Dahlke
@ 2015-12-25 23:18               ` Adam Thompson
  2015-12-25 23:51                 ` Karl Dahlke
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Thompson @ 2015-12-25 23:18 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 3820 bytes --]

On Thu, Dec 24, 2015 at 09:29:41PM -0500, Karl Dahlke wrote:
> Well as we consider various architectures
> I always keep in the back of my head that we're a couple of volunteers in our spare time.
> (I wonder how many fulltime people work on Chrome, or Edge.)
> We may not have the resources to do it the best way.

We're resource limited in terms of developers perhaps,
but we should aim for a good design since this usually makes things easier in the long run.
I think the time this will take to get right is probably worth the effort,
particularly as we're currently progressing at a relatively fast rate.

> In fact this year has been an anomaly, with Geoff and Kevin joining us,
> and really making good progress.

It's also brought us closer to getting more websites working and we've
gained a lot of momentum as a project because of there hard work.
I think we shouldn't waste this momentum and,
although we may move incrementally, I think we need to increment in the right
direction, even if that makes the individual increments smaller than they
otherwise may be.

> > That's cool, but what if async starts taking a long time,
> > like some sort of exponential algorithm or something?
> 
> Then it takes a long time.
> It shouldn't, unless the js page is badly written, but if it does, then so be it.
> I'm not trying to be glib, just saying we don't have the time or resources
> to build a preemtive time slicing operating system in edbrowse,
> that can context switch in and out of js sessions,
> in an engine independent fashion, or even within the constraints of a fixed engine -
> I think we're just not going to have the time for that.

And I'm not suggesting for a moment that we do that.
That's why a good architecture is important,
so that the underlying operating system can handle that stuff.
What I mean is that, by using processes and threads correctly,
we can get the async stuff to run without having to manage our own,
instruction level, time slicing.

> Pieces of js run under the ospices of the engine,
> without preemption, in their particular context / window,
> and we just hope they are sane.
> On a modern computer, a js computation would have to be insane
> to last more than a millisecond, and sure it could happen,
> and if it does edbrowse might not react as well as Chrome,
> but sometimes we're trying to get it functional,
> without necessarily covering all the corner cases.

That's true, but, as I said above, I'd rather we take smaller steps towards an
architecture which can, when we have time to implement them,
handle all the corner cases rather than jump on an easier one which we then
have to rewrite in the future when we need true async stuff.
We've done that before in other areas and it's taken a while to fix those problems.
Also, there are several well established design patterns which will completely
kill edbrowse with the new synchronous http.
One of which uses a long running http request to allow a server to push
information periodically to a web page.
This currently will cause edbrowse to hang indefinitely,
which is probably not what we want. Unfortunately,
things like this are increasingly common as developpers expect AJAX to be
truely asynchronous.
I'm not saying to hold on what we've currently got, far from it,
but rather I'm saying we need to think about this when designing things.

> P.S. If my finances continue to nosedive I might have to return to the work force,
> and that's one less edbrowse developer.
> Let's hope that doesn't happen.

It'd be a shame if you have to go back to work, but if it happens it happens.
That only makes it even more important to put in the correct foundations now
rather than keep saying we don't have time to create them.

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Edbrowse-dev]     One program Two processes
  2015-12-25 23:18               ` Adam Thompson
@ 2015-12-25 23:51                 ` Karl Dahlke
  2015-12-26  9:11                   ` Adam Thompson
  0 siblings, 1 reply; 16+ messages in thread
From: Karl Dahlke @ 2015-12-25 23:51 UTC (permalink / raw)
  To: edbrowse-dev

> the underlying operating system can handle that stuff.

That's the hope, so let's track it down a little further.

Imagine pieces of js running truly asynchronously of each other
and of the main interactive edbrowse.
Such pieces must run under different processes or different threads.
That's how we leverage the operating system.

Take processes first.
This has two problems.
First is the js pool.
This is specific to mozilla but that's our engine for now.
If each edbrowse window, roughly a web page,
or even finer granularity perhaps, like a js timer,
if each of these spins up a process to run its js,
that process has to set its memory pool for the max js
that that web page might consume.
99% of the time this is a waste, most pages having just a little js,
but oh well, that's what we have to do,
and after you have pulled up 20 or 30 web pages you are out of memory.
Maybe duktape wouldn't have this problem but mozilla does.
You need one pool to take advantage of the law of large numbers.
Most web pages small, a couple big.

Now the other problem is that pages can be interrelated.
The js on one page can vector through window.frames into the js variables in other page.
No clue how we would do this, since it seems to violate the
AutoCompartment rules of moz js, but let's say there's a way to get past that,
then there is an even bigger mountain to climb if these
exists in sseparate process spaces.
Remember that windows doesn't have shared memory so we can't just have a common
js pool for all the processes.
Sending messages around every time you need an outside variable,
well maybe, but that's really getting complicated.
I don't see this as promising.

Next is separate threads in one js executing process.
This is much more doable, except, again, if you're looking for
true asynchronicity you have to ask if the js engine is threadsafe?
Is the compartment and all the variables etc on the stack
so you can switch from thread to thread and execute the js in each?
I honestly don't know, for mozilla or for any of the engines
we've been considering.
We'd have to write some stand-alone tests.

Finally, whether threads or processes, we must remember that any http
request has to vector through one instance of curl,
one http server that mediates all our needs.
Processes are impossible here.
If you want separate http requests to be asynchronous of one another they ahve to be in threads,
so they can access the same curl space, same cookies, same passwords,
same agent, etc,
but then we have the question of whether the curl calls are threadsafe.
I don't know.
More tests or research needed.

Karl Dahlke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Edbrowse-dev] One program Two processes
  2015-12-25 23:51                 ` Karl Dahlke
@ 2015-12-26  9:11                   ` Adam Thompson
  2015-12-26 13:36                     ` Karl Dahlke
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Thompson @ 2015-12-26  9:11 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 3522 bytes --]

On Fri, Dec 25, 2015 at 06:51:47PM -0500, Karl Dahlke wrote:
> Imagine pieces of js running truly asynchronously of each other
> and of the main interactive edbrowse.
> Such pieces must run under different processes or different threads.
> That's how we leverage the operating system.

Agreed.

> Take processes first.
> This has two problems.
> First is the js pool.
> This is specific to mozilla but that's our engine for now.
> If each edbrowse window, roughly a web page,
> or even finer granularity perhaps, like a js timer,
> if each of these spins up a process to run its js,
> that process has to set its memory pool for the max js
> that that web page might consume.
> 99% of the time this is a waste, most pages having just a little js,
> but oh well, that's what we have to do,
> and after you have pulled up 20 or 30 web pages you are out of memory.
> Maybe duktape wouldn't have this problem but mozilla does.
> You need one pool to take advantage of the law of large numbers.
> Most web pages small, a couple big.
> 
> Now the other problem is that pages can be interrelated.
> The js on one page can vector through window.frames into the js variables in other page.
> No clue how we would do this, since it seems to violate the
> AutoCompartment rules of moz js, but let's say there's a way to get past that,
> then there is an even bigger mountain to climb if these
> exists in sseparate process spaces.

Yes, I can't remember the name right now but one *can* do cross-compartment
requests somehow. I remember seeing the code for this somewhere,
although that may be out of date like a bunch of mozjs docs.

> Remember that windows doesn't have shared memory so we can't just have a common
> js pool for all the processes.
> Sending messages around every time you need an outside variable,
> well maybe, but that's really getting complicated.
> I don't see this as promising.

No, that's going to cause some issues without shared memory etc.

> Next is separate threads in one js executing process.
> This is much more doable, except, again, if you're looking for
> true asynchronicity you have to ask if the js engine is threadsafe?
> Is the compartment and all the variables etc on the stack
> so you can switch from thread to thread and execute the js in each?

Yes, that's one of the reasons behind compartments etc.
In fact they have constructs to support this since I think firefox works with threaded js.

> I honestly don't know, for mozilla or for any of the engines
> we've been considering.
> We'd have to write some stand-alone tests.

Duktape looks small enough it could be made thread safe if we need to.

> Finally, whether threads or processes, we must remember that any http
> request has to vector through one instance of curl,
> one http server that mediates all our needs.
> Processes are impossible here.
> If you want separate http requests to be asynchronous of one another they ahve to be in threads,
> so they can access the same curl space, same cookies, same passwords,
> same agent, etc,
> but then we have the question of whether the curl calls are threadsafe.
> I don't know.
> More tests or research needed.

Yes, again I'm 99 % certain they are.

I'm also going to look into IPC mechanisms for Windows since I believe Chrome
uses multi-process somehow. I know they use a different engine but there must
be some way to communicate this stuff if they're really doing things this way.

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Edbrowse-dev]      One program Two processes
  2015-12-26  9:11                   ` Adam Thompson
@ 2015-12-26 13:36                     ` Karl Dahlke
  2015-12-26 15:10                       ` Adam Thompson
  0 siblings, 1 reply; 16+ messages in thread
From: Karl Dahlke @ 2015-12-26 13:36 UTC (permalink / raw)
  To: edbrowse-dev

> I'm also going to look into IPC mechanisms for Windows

Geoff says that if you want portable flexible interprocess communication,
(more flexible than pipes), you have to bite the bullet and use sockets.
Processes listen on certain ports, send messages to each other via tcp,
using send() and recv(), which are both unix and windows calls.
It's rather a pain to set up initially but when it is rolling it works fine.
I'm not looking forward to that,
but as I think about itI'm more convinced he's right,
and why should I have to think about it at all; he's the expert.
He knows.

So we may have to bring back tcp.c from the archives of git.
This was a wrapper file that set up sockets in a unix / windows hiding fashion,
with one common interface.
We used it long before curl, when I did it all by hand,
then through it away since curl seems to do everything.
But here we may need it again so Chris you might
want to at least find the latest and clean it up etc.
There was also a tcp.h but honestly it was small
and I wouldn't mind folding it into eb.h.

Karl Dahlke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Edbrowse-dev] One program Two processes
  2015-12-26 13:36                     ` Karl Dahlke
@ 2015-12-26 15:10                       ` Adam Thompson
  2015-12-26 15:23                         ` Karl Dahlke
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Thompson @ 2015-12-26 15:10 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

On Sat, Dec 26, 2015 at 08:36:35AM -0500, Karl Dahlke wrote:
> > I'm also going to look into IPC mechanisms for Windows
> 
> Geoff says that if you want portable flexible interprocess communication,
> (more flexible than pipes), you have to bite the bullet and use sockets.
> Processes listen on certain ports, send messages to each other via tcp,
> using send() and recv(), which are both unix and windows calls.
> It's rather a pain to set up initially but when it is rolling it works fine.
> I'm not looking forward to that,
> but as I think about itI'm more convinced he's right,
> and why should I have to think about it at all; he's the expert.
> He knows.

Agreed, sockets was where I was thinking of heading with this.
I thought Windows had something like unix domain sockets rather than ports
though but I don't know. At any rate,
I'd go for UDP rather than TCP for local IPC,
that way we can easily multiplex sockets rather than having to multiplex TCP connections.
Any thoughts?

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Edbrowse-dev]       One program Two processes
  2015-12-26 15:10                       ` Adam Thompson
@ 2015-12-26 15:23                         ` Karl Dahlke
  2015-12-26 15:40                           ` Adam Thompson
  0 siblings, 1 reply; 16+ messages in thread
From: Karl Dahlke @ 2015-12-26 15:23 UTC (permalink / raw)
  To: edbrowse-dev

> I'd go for UDP rather than TCP for local IPC,

Imagine one edbrowse-ht process, the one with the curl space,
that does all the http https and ftp fetches.
edbrowse connects to edbrowse-ht and says
"fetch this big http file and return it to me,
don't just download it I want it in memory, so return it to me."
edbrowse-ht pulls down a hundred meg file and sends it
back to edbrowse over socket,
but we need perfect fidelity here, we need the continuity
and certainty of the stream.
If we don't use tcp we'll have to reinvent most of it,
so we may as well use tcp, at least for this instance
and probably for most of our communications.
udp is fine for packet voice and such in that if you miss some packets you don't care,
it's barely a click in the sound, but here we need all the bits in
exactly the right order.

Karl Dahlke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Edbrowse-dev] One program Two processes
  2015-12-26 15:23                         ` Karl Dahlke
@ 2015-12-26 15:40                           ` Adam Thompson
  0 siblings, 0 replies; 16+ messages in thread
From: Adam Thompson @ 2015-12-26 15:40 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

On Sat, Dec 26, 2015 at 10:23:40AM -0500, Karl Dahlke wrote:
> > I'd go for UDP rather than TCP for local IPC,
> 
> Imagine one edbrowse-ht process, the one with the curl space,
> that does all the http https and ftp fetches.
> edbrowse connects to edbrowse-ht and says
> "fetch this big http file and return it to me,
> don't just download it I want it in memory, so return it to me."
> edbrowse-ht pulls down a hundred meg file and sends it
> back to edbrowse over socket,
> but we need perfect fidelity here, we need the continuity
> and certainty of the stream.
> If we don't use tcp we'll have to reinvent most of it,
> so we may as well use tcp, at least for this instance
> and probably for most of our communications.

Not necesarily. I think you'd want to use temp files for something like that,
otherwise the IPC logic gets complicated.
> udp is fine for packet voice and such in that if you miss some packets you don't care,
> it's barely a click in the sound, but here we need all the bits in
> exactly the right order.

Also, packet loss over a local loopback connection shouldn't happen,
so that's not really an issue either.
From the work I've done with scalable IPC stuff in the past I'd go for a
combined approach of having udp command and response and then temp files for
large downloads. Sure we then have to read back into memory but that makes much
more sense to me than implementing TCP-based IPC in this case.
Also, for the case of most pages, this wouldn't be too much of an issue.

The alternative is a comms process which either has to do tcp multiplexing in a
single thread or has to be multi-threaded even before you get to the comms stuff.
That's going to be awkward to get right.

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-12-26 15:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-23 15:09 [Edbrowse-dev] One program Two processes Karl Dahlke
2015-12-23 18:45 ` Adam Thompson
2015-12-23 19:07   ` Karl Dahlke
2015-12-23 19:59   ` Chris Brannon
2015-12-23 20:44     ` Karl Dahlke
2015-12-24 11:19       ` Adam Thompson
2015-12-24 13:15         ` Karl Dahlke
2015-12-24 18:39           ` Adam Thompson
2015-12-25  2:29             ` Karl Dahlke
2015-12-25 23:18               ` Adam Thompson
2015-12-25 23:51                 ` Karl Dahlke
2015-12-26  9:11                   ` Adam Thompson
2015-12-26 13:36                     ` Karl Dahlke
2015-12-26 15:10                       ` Adam Thompson
2015-12-26 15:23                         ` Karl Dahlke
2015-12-26 15:40                           ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).