[Edbrowse-dev] Messages to and from edbrowse-curl

edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed

* [Edbrowse-dev] Messages to and from edbrowse-curl
@ 2015-12-30 22:00 Karl Dahlke
  2016-01-01 14:46 ` Adam Thompson
  0 siblings, 1 reply; 9+ messages in thread
From: Karl Dahlke @ 2015-12-30 22:00 UTC (permalink / raw)
  To: Edbrowse-dev

This is mostly from the point of view of edbrowse and surfing the net,
though I imagine js would issue a subset of these messages, as we've seen through xhr.

Before I start with these, let me say that I think all the mail stuff
should stay just like it is.
It ain't broke, don't fix it.
We don't need to fetch mail: pop3, imap, smtp, through edbrowse-curl.
Let's just leave that to the side for now
and concentrate on the browsing aspects.

Message from edbrowse to curl indicated by *

* Reread the config file, it has changed.
There could be new proxies, new cert file,
new sites that don't require certificates, etc etc,
and edbrowse-curl needs to refresh.
This is a bit like apache reload of /etc/httpd/conf/httpd.conf
Response:ack   - if a response is even required.

* Update a runtime parameter.
These are the commands like vs sr fmp fma hr that influence internet fetches.
We changed the variable in edbrowse,
this is a message to change the same variable in edbrowse-curl.
Response:ack   - if a response is even required.
There's some question of whether these changes are system wide,
which would be the easiest to implement.
type vs in any running instance of edbrowse and we are not verifying ssl for any
of the running edbrowse programs, because edbrowse-curl isn't doing that, or,
edbrowse-curl maintains instances of these variables for each connected edbrowse.
I really don't know the answer to this and don't have a strong opinion on it.

* Fetch this url, generic.
curl does most of what is in http today, store cookies,
follow 301 302 redirections if that feature is enabled,
until it reaches data, or some other condition.
Responses:
- The data, coming over a stream or in a temp file or whatever makes sense,
and perhaps the headers as well.
- A request for user name and password, http code 401.
edbrowse-curl shouldn't be doing I/O, edbrowse gets the username
password and sends it back to edbrowse-curl.
- Debugging information if db4, the curl traffic sent and received.
This we just print when we get it and wait for the next response.
- Various urls that the browser is redirected to, if db2.
Again, this is debug information that we just print.
- The new url, the actual location of the page after redirection.
This becomes the "filename" of the page, and the base for relative url resolution.
- url looks like a streaming mime type.
We usually know this from protocol or suffix, but maybe that wasn't clear
until content-type was compared against your plugins.
- Progress dots, one per megabyte. Again we just print these.
- Looks like a file you'll want to download, supply a file name or x to abort
or space to read in memory as usual.

* Fetch this url and download to a file.
This is followup from the previous response, if you requested download to disk.
We should sent the final url, not the initial url.
Indicate foreground or background.
It's possible that edbrowse-curl doesn't care which,
just a matter of whether edbrowse waits for this to finish or moves on,
but it does matter because edbrowse-curl only sends the progress dots
if download is in the foreground.
Or maybe just an independent parameter for dots or no dots.
They aren't printed when fetching javascript, for instance.

* cookie name=value
This is the result of <meta http-equiv=set-cookie> or by javascript
document.cookie = "foo=bar";
Response:ack   - if a response is even required.

* Get cookies for this url.
These are used to populate document.cookie when javascript starts.

That's my list for now.
I think that would give us today's functionality.

Karl Dahlke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Edbrowse-dev] Messages to and from edbrowse-curl
  2015-12-30 22:00 [Edbrowse-dev] Messages to and from edbrowse-curl Karl Dahlke
@ 2016-01-01 14:46 ` Adam Thompson
  2016-01-01 15:40   ` Karl Dahlke
  0 siblings, 1 reply; 9+ messages in thread
From: Adam Thompson @ 2016-01-01 14:46 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 4601 bytes --]

A few thoughts:
On Wed, Dec 30, 2015 at 05:00:44PM -0500, Karl Dahlke wrote:
> * Update a runtime parameter.
> These are the commands like vs sr fmp fma hr that influence internet fetches.
> We changed the variable in edbrowse,
> this is a message to change the same variable in edbrowse-curl.
> Response:ack   - if a response is even required.
> There's some question of whether these changes are system wide,
> which would be the easiest to implement.
> type vs in any running instance of edbrowse and we are not verifying ssl for any
> of the running edbrowse programs, because edbrowse-curl isn't doing that, or,
> edbrowse-curl maintains instances of these variables for each connected edbrowse.
> I really don't know the answer to this and don't have a strong opinion on it.

I'm thinking probably that each user would need their own edbrowse-curl,
and that within each edbrowse-curl we'd keep track of the individual instances
of edbrowse connected to it. This handles a bunch of otherwise potentially odd 
corner cases.

> * Fetch this url, generic.
> curl does most of what is in http today, store cookies,
> follow 301 302 redirections if that feature is enabled,
> until it reaches data, or some other condition.
> Responses:
> - The data, coming over a stream or in a temp file or whatever makes sense,
> and perhaps the headers as well.
> - A request for user name and password, http code 401.
> edbrowse-curl shouldn't be doing I/O, edbrowse gets the username
> password and sends it back to edbrowse-curl.
> - Debugging information if db4, the curl traffic sent and received.
> This we just print when we get it and wait for the next response.
> - Various urls that the browser is redirected to, if db2.
> Again, this is debug information that we just print.
> - The new url, the actual location of the page after redirection.
> This becomes the "filename" of the page, and the base for relative url resolution.
> - url looks like a streaming mime type.
> We usually know this from protocol or suffix, but maybe that wasn't clear
> until content-type was compared against your plugins.
> - Progress dots, one per megabyte. Again we just print these.

No, I think we just download the data,
may be have a status message and then it's up to edbrowse what it does.
> - Looks like a file you'll want to download, supply a file name or x to abort
> or space to read in memory as usual.

Again, I'd just download the data, if we want to pull it into memory then we
can potentially do that, but mean while the data download carries on.
I'm thinking that makes more sense because in the case of small files the IO
penalty is probably not that high, and in the case of large files we probably
don't want to load them into memory anyway.
That also gives us a way to implement a sort of page cache in the future if we 
want to.

> * Fetch this url and download to a file.
> This is followup from the previous response, if you requested download to disk.
> We should sent the final url, not the initial url.
> Indicate foreground or background.
> It's possible that edbrowse-curl doesn't care which,
> just a matter of whether edbrowse waits for this to finish or moves on,
> but it does matter because edbrowse-curl only sends the progress dots
> if download is in the foreground.
> Or maybe just an independent parameter for dots or no dots.
> They aren't printed when fetching javascript, for instance.

See above for thoughts on the dots. At the end of the day, that's aUI decision, i.e.
on my internet connection I'd quite like to switch them off entirely tbh
because, depending on the server,downloads happen either so fast I can't count the dots or so
slowly that having a random dot appearing is not very useful when I don't know
the actual file size.
Try counting how far through a 200 meg download you are in dot form,
may be your personal speech adapter does that but on a braille display it's
just a huge string of dots and speakup also doesn't do that as far as I know.
What I'd actually like is a way to find out the file size (usually from the
content-length header) and the actual amount (to the byte preferably)
that's actually been downloaded so I can see how long I have to wait.

> * cookie name=value
> This is the result of <meta http-equiv=set-cookie> or by javascript
> document.cookie = "foo=bar";
> Response:ack   - if a response is even required.
> 
> * Get cookies for this url.
> These are used to populate document.cookie when javascript starts.

That makes sense.

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Edbrowse-dev]  Messages to and from edbrowse-curl
  2016-01-01 14:46 ` Adam Thompson
@ 2016-01-01 15:40   ` Karl Dahlke
  2016-01-01 18:24     ` Adam Thompson
  0 siblings, 1 reply; 9+ messages in thread
From: Karl Dahlke @ 2016-01-01 15:40 UTC (permalink / raw)
  To: Edbrowse-dev

> I'm thinking probably that each user would need their own edbrowse-curl,

If I, as a single user, could still have multiple instances
of edbrowse connecting to it, from different consoles,
accessing similar websites and sharing one cookie space,
then that makes sense and is fine.
/tmp/edbrowse directory could contain a file per user indicating
the process of edbrowse-curl running, if such is running,
/tmp/edbrowse/run-curl-user,
or spawn one otherwise, similar to the /var/run files in linux
though I don't think there is such a standard in windows
so we may as well use /tmp/edbrowse for this purpose on both OSs,
again keeping the code as much the same in both worlds.
This directory already exists and is used for temp files for plugins,
e.g. converting pdf to html in preparation for browse, etc etc.
Could also be used for temp files if we choose to use that mechanism
to pass the http data back to edbrowse.

> No, I think we just download the data,
> may be have a status message and then it's up to edbrowse what it does.

It could be  a personal preference, but I don't want it to work that way.
I don't want to sit there not knowing, and then later the download is done,
I want to see the progress dots as they happen, in real time,
otherwise why have them. I mean that's the whole point.
I also like to see the debug messages as they happen, in real time,
not spewing out at the end after download.
More than once edbrowse has been in an infinite loop,
despite our loop counter in http.c, an infinite redirection loop,
and I see that with db2, see the websites it is vectoring through,
or maybe the same site again and again, and I hit control c to stop it.
Anyways that's just an example - I want to see things as they happen.
It really isn't hard to do.

> > - Looks like a file you'll want to download, supply a file name or x to abort
> Again, I'd just download the data, if we want to pull it into memory then we
> can potentially do that, but mean while the data download carries on.

Small files just don't matter, and for big downloads you've
lost your flexibility here.
Maybe you don't have enough ram for that big iso image,
and maybe /tmp/edbrowse, or wherever you put it by default,
doesn't have room either.
It's on a basic windows drive like c: and you really have room on d:,
so curl asks you where you want to download it as soon as it gets the headers
and you direct it somewhere else.
It's a little more work to do it this way,
but not much more really, a few messages,
and that's how it works now,
and it will let some people do some things on some computers
that could not be done with an autodownload into a predefined place
and then figure it out later philosophy.

> on my internet connection I'd quite like to switch them off entirely tbh

That could be an easy toggle command.

> because, depending on the server,downloads happen either so fast I can't count the dots or so
> slowly that having a random dot appearing is not very useful when I don't know
> the actual file size.

I guess I don't understand the second part.
Large downloads would happen in background, and you don't get the dots then anyways.

> Try counting how far through a 200 meg download you are in dot form,

It's easy, I just hit control w.
Again, that's my adapter, and maybe why I like the feature so much.

> content-length header) and the actual amount (to the byte preferably)
> that's actually been downloaded so I can see how long I have to wait.

sure, like a progress bar or percent indicator.
Actually I like that less but could be done and selected by the user,
dots or percent or quiet.

Karl Dahlke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Edbrowse-dev] Messages to and from edbrowse-curl
  2016-01-01 15:40   ` Karl Dahlke
@ 2016-01-01 18:24     ` Adam Thompson
  2016-01-01 19:05       ` Karl Dahlke
  0 siblings, 1 reply; 9+ messages in thread
From: Adam Thompson @ 2016-01-01 18:24 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 6107 bytes --]

On Fri, Jan 01, 2016 at 10:40:35AM -0500, Karl Dahlke wrote:
> > I'm thinking probably that each user would need their own edbrowse-curl,
> 
> If I, as a single user, could still have multiple instances
> of edbrowse connecting to it, from different consoles,
> accessing similar websites and sharing one cookie space,
> then that makes sense and is fine.
> /tmp/edbrowse directory could contain a file per user indicating
> the process of edbrowse-curl running, if such is running,
> /tmp/edbrowse/run-curl-user,
> or spawn one otherwise, similar to the /var/run files in linux
> though I don't think there is such a standard in windows
> so we may as well use /tmp/edbrowse for this purpose on both OSs,
> again keeping the code as much the same in both worlds.
> This directory already exists and is used for temp files for plugins,
> e.g. converting pdf to html in preparation for browse, etc etc.
> Could also be used for temp files if we choose to use that mechanism
> to pass the http data back to edbrowse.

Agreed, though we could just as well use the correct directories on Linux and
Windows if we're hiding all of this stuff in functions anyway.

> > No, I think we just download the data,
> > may be have a status message and then it's up to edbrowse what it does.
> 
> It could be  a personal preference, but I don't want it to work that way.
> I don't want to sit there not knowing, and then later the download is done,
> I want to see the progress dots as they happen, in real time,
> otherwise why have them. I mean that's the whole point.

Yeah, it depends on how you use it. To clarify what I'm saying,
I'm proposing that edbrowse-curl supports a status message,
and then it is up to edbrowse whether it checks the status of the download and
prints dots, checks the status of the download on command or whatever.
Basically though, edbrowse-curl doesn't fire dots at edbrowse in realtime
because that's a UI thing first and foremost.

> I also like to see the debug messages as they happen, in real time,
> not spewing out at the end after download.
> More than once edbrowse has been in an infinite loop,
> despite our loop counter in http.c, an infinite redirection loop,
> and I see that with db2, see the websites it is vectoring through,
> or maybe the same site again and again, and I hit control c to stop it.
> Anyways that's just an example - I want to see things as they happen.
> It really isn't hard to do.
But it is bad design. What I'm thinking we should do here is again have
edbrowse-curl produce all this stuff (either as messages or on command,
may be using a polling loop of some kind)
and then leave it up to edbrowse what it does, i.e.
we may want to put them in a log etc.
Actually, I'd quite like a way to enable debug logging to a file from within an
edbrowse session without having to quit and start script etc.

> > > - Looks like a file you'll want to download, supply a file name or x to abort
> > Again, I'd just download the data, if we want to pull it into memory then we
> > can potentially do that, but mean while the data download carries on.
> 
> Small files just don't matter, and for big downloads you've
> lost your flexibility here.
> Maybe you don't have enough ram for that big iso image,
> and maybe /tmp/edbrowse, or wherever you put it by default,
> doesn't have room either.
> It's on a basic windows drive like c: and you really have room on d:,
> so curl asks you where you want to download it as soon as it gets the headers
> and you direct it somewhere else.
> It's a little more work to do it this way,
> but not much more really, a few messages,
> and that's how it works now,
> and it will let some people do some things on some computers
> that could not be done with an autodownload into a predefined place
> and then figure it out later philosophy.
Yeah that makes sense actually, but what do we do with the download in the 
meantime?
I'm not sure if we should just pause waiting or not.
Perhaps start writing to a temp file,
then when the user selects a file or whatever dump what we've downloaded so far
to the new location? I'm just not sure what servers'll do if we stop reading
from them for a while, probably drop the connection.
I'm also thinking that we have a max size for direct transfers and anything
larger goes through a temp file reguardless.
That's probably best and allows udp without streaming issues.

> > on my internet connection I'd quite like to switch them off entirely tbh
> 
> That could be an easy toggle command.

That'd be quite nice in any case. Do you think you can do that in parallel?

> > because, depending on the server,downloads happen either so fast I can't count the dots or so
> > slowly that having a random dot appearing is not very useful when I don't know
> > the actual file size.
> 
> I guess I don't understand the second part.
> Large downloads would happen in background, and you don't get the dots then anyways.

It depends really. It'd be nice to have some indication what's happening which
gave me information I could use, I can't really count the dots and yet
sometimes I need to load large documents (think pdf manuals,
yes some of them are huge) to actually view them.

> > Try counting how far through a 200 meg download you are in dot form,
> 
> It's easy, I just hit control w.
> Again, that's my adapter, and maybe why I like the feature so much.

Yes, that's adapter specific as far as I know,
and we really need edbrowse to be useful with or without it.

> > content-length header) and the actual amount (to the byte preferably)
> > that's actually been downloaded so I can see how long I have to wait.
> 
> sure, like a progress bar or percent indicator.
> Actually I like that less but could be done and selected by the user,
> dots or percent or quiet.

Yeah, sounds good. I'd probably print something like <amount downloaded>/<total
size> as a progress indicator rather than converting to percent but again
that's personal preference.

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Edbrowse-dev]   Messages to and from edbrowse-curl
  2016-01-01 18:24     ` Adam Thompson
@ 2016-01-01 19:05       ` Karl Dahlke
  2016-01-01 20:18         ` Adam Thompson
  0 siblings, 1 reply; 9+ messages in thread
From: Karl Dahlke @ 2016-01-01 19:05 UTC (permalink / raw)
  To: Edbrowse-dev

> Yeah that makes sense actually, but what do we do with the download
> in the meantime?
> I'm not sure if we should just pause waiting or not.

No I don't think that's good.
Servers will time out etc.
Nor do I think we should start writing to a temp file and move data around later.
I can tell you what we do today and hope that helps.
I analyze the headers as they come in and compare them
against standards and plugins etc.
If it is not text, and not something you know you want to download to play
as a plugin, say usually some kind of binary,
then I stop the download after the headers,
as though it were a HEAD call.
There's a curl command to stop the download gracefully.
Once I know where to send the data, I restart the call.
I get the headers and the data and into the file it goes
and it all works without trouble.

I imagine edbrowse-curl would analyze the headers as we do now,
and if a potential download, it sends the message back to edbrowse and stops.
Then edbrowse sends message back restarting the download
into the designated file,
thus the different messages: fetch generic and fetch into a named file.

Sure, I'll work on some different progress options,
including quiet, no progress indicators at all,
just the file size when done.
I'm happy to work on little features in parallel,
and there seems to be plenty of them.

cheers.

Karl Dahlke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Edbrowse-dev] Messages to and from edbrowse-curl
  2016-01-01 19:05       ` Karl Dahlke
@ 2016-01-01 20:18         ` Adam Thompson
  2016-01-01 20:42           ` Karl Dahlke
  2016-01-01 20:45           ` Karl Dahlke
  0 siblings, 2 replies; 9+ messages in thread
From: Adam Thompson @ 2016-01-01 20:18 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2566 bytes --]

On Fri, Jan 01, 2016 at 02:05:34PM -0500, Karl Dahlke wrote:
> > Yeah that makes sense actually, but what do we do with the download
> > in the meantime?
> > I'm not sure if we should just pause waiting or not.
> 
> No I don't think that's good.
> Servers will time out etc.
> Nor do I think we should start writing to a temp file and move data around later.

I agree none of these solutions are ideal but I'm not sure about the correct 
solution here.

> I can tell you what we do today and hope that helps.
> I analyze the headers as they come in and compare them
> against standards and plugins etc.
> If it is not text, and not something you know you want to download to play
> as a plugin, say usually some kind of binary,
> then I stop the download after the headers,
> as though it were a HEAD call.
> There's a curl command to stop the download gracefully.
> Once I know where to send the data, I restart the call.

My issue here is with any fancy redirects we may encounter.
Theoretically they should work but I can think of cases where this may break
when we try and restart the download.

> I get the headers and the data and into the file it goes
> and it all works without trouble.

Agreed.

> I imagine edbrowse-curl would analyze the headers as we do now,
> and if a potential download, it sends the message back to edbrowse and stops.
> Then edbrowse sends message back restarting the download
> into the designated file,
> thus the different messages: fetch generic and fetch into a named file.

I'd rather not do the restart if at all possible.
I'd rather download and move data, that's not too bad.
May be we have a maximum grace time after which we stop,
but I'd rather not just stop the download.
Or work out how to do either a head call for http or an ftp call (I'm fairly
sure I've seen one) since any server *should* support that.

> Sure, I'll work on some different progress options,
> including quiet, no progress indicators at all,
> just the file size when done.
The more I think about it, the more i like the 7/235 megabytes option printed
at a fixed rate.
If we can also get status into the background download monitoring as well
that'd be great.

> I'm happy to work on little features in parallel,
> and there seems to be plenty of them.

Brilliant, thanks.

Cheers,
Adam.
PS: I can't remember but I know we have a plugins off and plugins on,
but do we have a plugins ask as well?
There are times when I'd like to choose whether to use a plugin at the point of download.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Edbrowse-dev]    Messages to and from edbrowse-curl
  2016-01-01 20:18         ` Adam Thompson
@ 2016-01-01 20:42           ` Karl Dahlke
  2016-01-02 11:46             ` Adam Thompson
  2016-01-01 20:45           ` Karl Dahlke
  1 sibling, 1 reply; 9+ messages in thread
From: Karl Dahlke @ 2016-01-01 20:42 UTC (permalink / raw)
  To: Edbrowse-dev

> My issue here is with any fancy redirects we may encounter.

The "stop and ask" where to download the file happens after all the redirects.
We have the actual url, cookies set,
authorizing user password if any, it's all in place,
and it all runs when restarted.
I don't think there's a problem here.

Karl Dahlke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Edbrowse-dev]    Messages to and from edbrowse-curl
  2016-01-01 20:18         ` Adam Thompson
  2016-01-01 20:42           ` Karl Dahlke
@ 2016-01-01 20:45           ` Karl Dahlke
  1 sibling, 0 replies; 9+ messages in thread
From: Karl Dahlke @ 2016-01-01 20:45 UTC (permalink / raw)
  To: Edbrowse-dev

> If we can also get status into the background download monitoring as well

Yeah that would be cool.

> do we have a plugins ask as well?

No, and that would be cool too.
Sometimes I have wanted that.

Karl Dahlke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Edbrowse-dev] Messages to and from edbrowse-curl
  2016-01-01 20:42           ` Karl Dahlke
@ 2016-01-02 11:46             ` Adam Thompson
  0 siblings, 0 replies; 9+ messages in thread
From: Adam Thompson @ 2016-01-02 11:46 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

On Fri, Jan 01, 2016 at 03:42:11PM -0500, Karl Dahlke wrote:
> > My issue here is with any fancy redirects we may encounter.
> 
> The "stop and ask" where to download the file happens after all the redirects.
> We have the actual url, cookies set,
> authorizing user password if any, it's all in place,
> and it all runs when restarted.
> I don't think there's a problem here.

If people play nice then no, but my worry is single access downloads where a
HEAD request may be special cased to not trigger the download lock,
but a GET request may alter the cookie such that a subsequent GET to the same
URL actually requires re-running all the fancy js-based auth in front of the 
download.
It's probably unlikely but I can certainly imagine implementing such a system
in certain circumstances and think it's worth handling if we can.

Cheers,
Adam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-01-02 11:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-30 22:00 [Edbrowse-dev] Messages to and from edbrowse-curl Karl Dahlke
2016-01-01 14:46 ` Adam Thompson
2016-01-01 15:40   ` Karl Dahlke
2016-01-01 18:24     ` Adam Thompson
2016-01-01 19:05       ` Karl Dahlke
2016-01-01 20:18         ` Adam Thompson
2016-01-01 20:42           ` Karl Dahlke
2016-01-02 11:46             ` Adam Thompson
2016-01-01 20:45           ` Karl Dahlke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).