edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] curl handles and general comms design
@ 2015-12-29 19:17 Adam Thompson
  2015-12-29 19:57 ` Karl Dahlke
  2016-01-05  0:38 ` Chris Brannon
  0 siblings, 2 replies; 9+ messages in thread
From: Adam Thompson @ 2015-12-29 19:17 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1937 bytes --]

Hi,

I noticed in a recent commit that we're now not keeping curl handles around.
We probably don't want to head in this direction since that also means we're
not doing persistant connections or correct things with cookies.
I also notice that Edbrowse appears to be performing *extremely* poorly (yeah I
know we're looking for functionality first, but still...).
I'm not sure if the two things are related or not but it prompted me to have a
closer look at curl. I've listed some of what I found below,
along with how I think we *could* use it.

First of all, it seems the curl devs and I and Karl (I think?)
were thinking along the same lines in terms of single-threaded multiplexed curl transfers.
To accomplish this there's a curl-multi interface which allows multiple
concurrent transfers with either an event-driven or fd-based (read select or 
poll) API.
What this basically means is that we can add curl handles to a stack and then
use an event loop to call curl_multi_perform and curl will handle all the
multiplexing for us, telling us when things need attention.
This also gets us connection sharing and dns cache sharing.

To solve the parallel curl handles accessing cookie databases issue,
there's also the curl-shared interface.
I believe this can be used with the curl-multi interface since the curl-multi
interface is single-threaded so no need for mutexes etc.
This allows us to share cookies between curl handles as well as other data
should we need them.

What this all means I think is that, by combining both interfaces,
we should be able to create a single-threaded, essentially async, comms layer.
This should also fix some of the strange cookie issues we have,
as well as allow us to better use persistant connections and dns caching.
I'm not sure if I'll have time to actually do the coding for this or not,
but I think it's worth discussing.

Any thoughts?

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Edbrowse-dev]  curl handles and general comms design
  2015-12-29 19:17 [Edbrowse-dev] curl handles and general comms design Adam Thompson
@ 2015-12-29 19:57 ` Karl Dahlke
  2015-12-29 22:27   ` Chris Brannon
  2015-12-30 12:01   ` Adam Thompson
  2016-01-05  0:38 ` Chris Brannon
  1 sibling, 2 replies; 9+ messages in thread
From: Karl Dahlke @ 2015-12-29 19:57 UTC (permalink / raw)
  To: Edbrowse-dev

Yes, sorry if edbrowse is in a state of flux right now.
I'm closing curl handles more as a proof of concept,
not saying we need to or want to do that in general.
Can we retain our cookie state with at least one handle open?
We know about the sharing interface and Chris is going to work on that,
and then all the handles should tie together with one cookie state
and then some things should clear up.
Some of the inefficiency could be rereading and rewriting the cookie file
all the time, and that will go away.

I put more of our state variables for downloading files etc
on the stack in preparation for threads.
I use to use static variables and got away with it
since processes forked off, but we know that won't work
so now moving to stacks in preparation for threads.
This is the "curl http background download mini project"
that we agreed on last week,
and it should finish up soon and then we can talk about the next step.

I'm thinking about an edbrowse-http process to handle these curl requests
and maintain the one and only cookie state, but perhaps not just for edbrowse,
perhaps for all instances of edbrowse running.
I run edbrowse from various virtual consoles, sometimes without realizing it,
and indeed one edbrowse could clobber cookies introduced by another edbrowse.
I haven't noticed that specifically but it probably has happened.
So if edbrowse-http became a daemon serving the curl needs
of all edbrowse processes running,
then that would solve any and all cookie collision problems forever more.
Wonder if Chrome and others effectively share the cookie jar and favorites etc
among separately running instances of the browser?
Well it's something to think about.

I also need to resurrect some of the socket routines that are buried in git,
so we can use them.
They're suppose to hide the differences between unix and windows,
and they were tested at one time but that was like 15 years ago,
so I'll want Geoff to look at them again.

Send any bug reports or disasters to me,
and hopefully this will stabilize soon.

Cheers.

Karl Dahlke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Edbrowse-dev] curl handles and general comms design
  2015-12-29 19:57 ` Karl Dahlke
@ 2015-12-29 22:27   ` Chris Brannon
  2015-12-30 12:01   ` Adam Thompson
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Brannon @ 2015-12-29 22:27 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> I also need to resurrect some of the socket routines that are buried in git,
> so we can use them.

http://the-brannons.com/tcp.tgz should be the files you are looking for.

-- Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Edbrowse-dev] curl handles and general comms design
  2015-12-29 19:57 ` Karl Dahlke
  2015-12-29 22:27   ` Chris Brannon
@ 2015-12-30 12:01   ` Adam Thompson
  2015-12-30 12:26     ` Karl Dahlke
  1 sibling, 1 reply; 9+ messages in thread
From: Adam Thompson @ 2015-12-30 12:01 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 4280 bytes --]

On Tue, Dec 29, 2015 at 02:57:27PM -0500, Karl Dahlke wrote:
> Yes, sorry if edbrowse is in a state of flux right now.
> I'm closing curl handles more as a proof of concept,
> not saying we need to or want to do that in general.

That's ok, I understand that. I'm running the latest dev version, and thus,
as I've said in the past I think, things'll probably break.
What had me a little worried was the amount of possible breakages (and the damn
near 30 second page load times for the Jenkins CI system we use at work, as well as a bunch of other sites).

> Can we retain our cookie state with at least one handle open?

From reading the docs, it appears as if, as long as the share handle is open the cookies are retained, though this may not be quite correct.

> We know about the sharing interface and Chris is going to work on that,
> and then all the handles should tie together with one cookie state
> and then some things should clear up.
> Some of the inefficiency could be rereading and rewriting the cookie file
> all the time, and that will go away.

Yes, that makes sense.

> I put more of our state variables for downloading files etc
> on the stack in preparation for threads.
> I use to use static variables and got away with it
> since processes forked off, but we know that won't work
> so now moving to stacks in preparation for threads.
> This is the "curl http background download mini project"
> that we agreed on last week,
> and it should finish up soon and then we can talk about the next step.

As I said in my last email, I don't think we even need threads for this,
just curl-multi with it's single threaded multiplexing.
I work with an architecture similar to this and it genuinely works very
scalably, with buffering and callbacks ensuring you get multiplexed transfers.
Given how well-tested curl is, I've no reason to doubt their scalability claims
and I really think that this will work.

> I'm thinking about an edbrowse-http process to handle these curl requests
> and maintain the one and only cookie state, but perhaps not just for edbrowse,
> perhaps for all instances of edbrowse running.
> I run edbrowse from various virtual consoles, sometimes without realizing it,
> and indeed one edbrowse could clobber cookies introduced by another edbrowse.
> I haven't noticed that specifically but it probably has happened.

Agreed.

> So if edbrowse-http became a daemon serving the curl needs
> of all edbrowse processes running,
> then that would solve any and all cookie collision problems forever more.
> Wonder if Chrome and others effectively share the cookie jar and favorites etc
> among separately running instances of the browser?
> Well it's something to think about.

Yes they do by default as far as I know.
I think I'd probably call it something like edbrowse-curl though and then add
in all the curl stuff we do since, as far as I can work out,
the multi interface (not sure about the share interface but quite possibly)
can handle different protocols concurrently.

> I also need to resurrect some of the socket routines that are buried in git,
> so we can use them.
> They're suppose to hide the differences between unix and windows,
> and they were tested at one time but that was like 15 years ago,
> so I'll want Geoff to look at them again.

Yeah, things've probably changed a bit since then.

> Send any bug reports or disasters to me,
> and hopefully this will stabilize soon.

Ok, if I can I'd like to actually do some coding on edbrowse again,
but I'm not sure if we'd just end up with massive conflicts.

Also, thinking about background and foreground downloads,
if we go down the curl-multi route or the separate comms process actually
(single-threaded via curl-multi or multi-threaded...),
then background and foreground (though not memory vs file unfortunately)
simply becomes a question of whether we print dots when we receive a certain
amount of data or not. Since edbrowse will be reading from the comms process
(or from curl events) that could even be toggleable during downloads since the
dot printing and the user would be using the same thread and process. It'd just be a question of setting a flag somewhere.

Regards,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Edbrowse-dev]   curl handles and general comms design
  2015-12-30 12:01   ` Adam Thompson
@ 2015-12-30 12:26     ` Karl Dahlke
  2015-12-30 13:22       ` Adam Thompson
  0 siblings, 1 reply; 9+ messages in thread
From: Karl Dahlke @ 2015-12-30 12:26 UTC (permalink / raw)
  To: Edbrowse-dev

> Ok, if I can I'd like to actually do some coding on edbrowse again,

That would be awesome.
I know you have a 9 to 5 and your time is limited.

I've been able to put time in over the past year,
tidy and dom representation and separate rendering of the page etc,
and truly enjoyed it to be honest, but if you're ready for a turn
I'll take a break for a bit and work on my 2 books or other things.
edbrowse-curl is fine, we certainly use it for more than http:
ftp, sftp, scp, etc,
but I think it should be the same inage doing different things,
as we discovered with edbrowse-js.
Seems silly at first, you don't need all that other machinery,
but we found that more and more stuff creeps in.
The curl server will need information in the config file, so may as well
parse that, and url management, and stringfile.c,
and it needs to know all about plugins, even if it doesn't run the plugin directly
it needs to know about it, and on and on and pretty soon
you have 40 	to 50 percent of the code so to ease
distribution you may as well use the same image switching on argv[0].
Like main.c line 481, and then set whichproc = 'c';
(whichproc is e for edbrowse and j for javascript engine.)
Just a thought - see how it plays out.

Anyways, I'm happy to let you drive for a bit,
and I'll step back to avoid conflicts, I think we're pretty much
done with our experiments, separate handles do indeed
share everything as expected, and I found socket.c, which you and Geoff can look at,
so I'll submit just one more patch from Kevin,
which implements synchronous xhr,
and then I can just be in a bug fix minor updates mode for a while.
I'll post another message when the xhr code is in, probably later today,
and then you can have a whack at it.

I may post some thoughts on messages to and from edbrowse-curl,
if you don't mind, just to make sure we don't forget anything.
That's probably the first step, a flow description of what kinds of
messages / communication go back and forth.

Cheers,

Karl Dahlke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Edbrowse-dev] curl handles and general comms design
  2015-12-30 12:26     ` Karl Dahlke
@ 2015-12-30 13:22       ` Adam Thompson
  2015-12-30 13:41         ` Karl Dahlke
  0 siblings, 1 reply; 9+ messages in thread
From: Adam Thompson @ 2015-12-30 13:22 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 3626 bytes --]

On Wed, Dec 30, 2015 at 07:26:12AM -0500, Karl Dahlke wrote:
> > Ok, if I can I'd like to actually do some coding on edbrowse again,
> 
> That would be awesome.
> I know you have a 9 to 5 and your time is limited.
> 
> I've been able to put time in over the past year,
> tidy and dom representation and separate rendering of the page etc,
> and truly enjoyed it to be honest, but if you're ready for a turn
> I'll take a break for a bit and work on my 2 books or other things.

If we split things up appropriately we can probably both do stuff.
There are a few things we can work on in parallel and there are a few things we
probably need to get ironed out first.

> edbrowse-curl is fine, we certainly use it for more than http:
> ftp, sftp, scp, etc,
> but I think it should be the same inage doing different things,
> as we discovered with edbrowse-js.
> Seems silly at first, you don't need all that other machinery,
> but we found that more and more stuff creeps in.
> The curl server will need information in the config file, so may as well
> parse that, and url management, and stringfile.c,
> and it needs to know all about plugins, even if it doesn't run the plugin directly
> it needs to know about it, and on and on and pretty soon
> you have 40 	to 50 percent of the code so to ease
> distribution you may as well use the same image switching on argv[0].
> Like main.c line 481, and then set whichproc = 'c';
> (whichproc is e for edbrowse and j for javascript engine.)
> Just a thought - see how it plays out.

I'll have a look at how we're doing this,
I'm wondering if we actually need the exec at all or if we can fork and then
set the flag and whether that'll do.

> Anyways, I'm happy to let you drive for a bit,
> and I'll step back to avoid conflicts, I think we're pretty much
> done with our experiments, separate handles do indeed
> share everything as expected, and I found socket.c, which you and Geoff can look at,
> so I'll submit just one more patch from Kevin,
> which implements synchronous xhr,
> and then I can just be in a bug fix minor updates mode for a while.
> I'll post another message when the xhr code is in, probably later today,
> and then you can have a whack at it.

Ok, thanks.

> I may post some thoughts on messages to and from edbrowse-curl,
> if you don't mind, just to make sure we don't forget anything.
> That's probably the first step, a flow description of what kinds of
> messages / communication go back and forth.

Yeah, it'd be kind of nice to iron out some of the ipc stuff before we start
splitting up edbrowse any further. I'm not sure if this is completely possible,
but really it'd be nice to have some sort of relatively standard interface
which applies to both edbrowse-js and edbrowse-curl.
Obviously the messages are different,
but it'd be good if we could have some sort of common mechanism such that we don't
have to reinvent everything each time we need a client and server.
Any thoughts? I'm thinking of something like standardising around sockets for
IPC; I really don't think that losing packets with UDP is too much of an issue
over a loopback interface, but we can always use tcp or temp files for larger
 transfers.
Again, it'd be nice if whatever we do handles this somewhat transparently
and portably.
If we were unix only my idea would be something like posix message queues (not
the older sysv ones) with temp files for genuinely huge things,
but that's not a possibility on Windows.
Still something with similar functionality would make life easier I think.

Regards,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Edbrowse-dev]    curl handles and general comms design
  2015-12-30 13:22       ` Adam Thompson
@ 2015-12-30 13:41         ` Karl Dahlke
  0 siblings, 0 replies; 9+ messages in thread
From: Karl Dahlke @ 2015-12-30 13:41 UTC (permalink / raw)
  To: Edbrowse-dev

> I'm wondering if we actually need the exec at all or if we can fork and then
> set the flag and whether that'll do.

This just doesn't work well in windows.
You kind of need the separate process so you can use spawn.
See how Geoff has set up the invocation of edbrowse-js.

I know when I worked on the separation of js, the messages
were the most important part, the code just sort of fell out after that.
So I'll send along some high level thoughts
on curl requests and responses.

Yes, a more generic messaging system, a layer above sockets, might be nice,
and I almost did that with js.
I sort of had to for my own sanity.
The message header includes a length for an error string,
and then the error string, if there is one, (length nonzero),
then the length for the js side effects,
and the side effects, if there are any,
and finally the js item requested or the acknowledgement of the action.
Now I have some general parsing of every message.
I can glom onto the js error, if any,
as part of the message envelope,
and the same for js side effects.
A higher level structure like this might be what you're looking for,
and it doesn't take a lot of coding to seee that you need it.

Karl Dahlke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Edbrowse-dev] curl handles and general comms design
  2015-12-29 19:17 [Edbrowse-dev] curl handles and general comms design Adam Thompson
  2015-12-29 19:57 ` Karl Dahlke
@ 2016-01-05  0:38 ` Chris Brannon
  2016-01-08 19:43   ` Adam Thompson
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Brannon @ 2016-01-05  0:38 UTC (permalink / raw)
  To: Adam Thompson; +Cc: Edbrowse-dev

Adam Thompson <arthompson1990@gmail.com> writes:

> To solve the parallel curl handles accessing cookie databases issue,
> there's also the curl-shared interface.
> I believe this can be used with the curl-multi interface since the curl-multi
> interface is single-threaded so no need for mutexes etc.

We're using that now, but with curl easy handles, rather than curl
multi.  I don't know what would be involved in moving over to curl multi.

> What this all means I think is that, by combining both interfaces,
> we should be able to create a single-threaded, essentially async, comms layer.

The one problem is that you cannot share persistent connections across
curl easy handles.

Basically, you create a curl multi handle, and then add curl easy
handles to it.  So when would we create the individual curl easy
handles?

-- Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Edbrowse-dev] curl handles and general comms design
  2016-01-05  0:38 ` Chris Brannon
@ 2016-01-08 19:43   ` Adam Thompson
  0 siblings, 0 replies; 9+ messages in thread
From: Adam Thompson @ 2016-01-08 19:43 UTC (permalink / raw)
  To: Chris Brannon; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2134 bytes --]

On Mon, Jan 04, 2016 at 04:38:32PM -0800, Chris Brannon wrote:
> Adam Thompson <arthompson1990@gmail.com> writes:
> 
> > To solve the parallel curl handles accessing cookie databases issue,
> > there's also the curl-shared interface.
> > I believe this can be used with the curl-multi interface since the curl-multi
> > interface is single-threaded so no need for mutexes etc.
> 
> We're using that now, but with curl easy handles, rather than curl
> multi.  I don't know what would be involved in moving over to curl multi.

I think curl multi is another object like curl shared,
so I *think* an easy handle can be in a shared and multi object at the same
time, though I'm really not sure.

> > What this all means I think is that, by combining both interfaces,
> > we should be able to create a single-threaded, essentially async, comms layer.
> 
> The one problem is that you cannot share persistent connections across
> curl easy handles.
> 
> Basically, you create a curl multi handle, and then add curl easy
> handles to it.  So when would we create the individual curl easy
> handles?

Ok, so I'm thinking of something like the following high level approach:
Keep the global curl shared object, but not the global curl easy handle.
The share object will be set for each new curl easy handle.
We create a global curl multi object and have logic to keep checking this for
active transfers and if any are found run the next chunk of the transfer.
For each transfer we add a curl easy handle to the multi object, and have logic to remove it when the transfer is finished.

If I understand things correctly the shared object should keep track of the
cookies whilst the multi object handles the connections and provides a way of
managing the connections so that the handles can be added when we need them.
We may need some way to get the cookies into and out of curl,
but I'm not sure of the cleanist way to do this.
I'm surprised there isn't a way to get the shared object to write the cookie
file on clean up but if not then we can always keep the global curl easy handle.

Regards,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-01-08 19:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-29 19:17 [Edbrowse-dev] curl handles and general comms design Adam Thompson
2015-12-29 19:57 ` Karl Dahlke
2015-12-29 22:27   ` Chris Brannon
2015-12-30 12:01   ` Adam Thompson
2015-12-30 12:26     ` Karl Dahlke
2015-12-30 13:22       ` Adam Thompson
2015-12-30 13:41         ` Karl Dahlke
2016-01-05  0:38 ` Chris Brannon
2016-01-08 19:43   ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).