edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
From: Karl Dahlke <eklhad@comcast.net>
To: Edbrowse-dev@lists.the-brannons.com
Subject: [Edbrowse-dev] the i_get structure
Date: Fri, 02 Mar 2018 17:35:24 -0500	[thread overview]
Message-ID: <20180202173524.eklhad@comcast.net> (raw)

[-- Attachment #1: Type: text/plain, Size: 4603 bytes --]

Ok, this is kinda long. I rewrote about 800 lines of code, in 11 different files, and pushed, and hope it didn't break anything. Here is the roadmap.

1. It would be nice if we could fetch all the javascript files from the internet in parallel, and the css files too. I'm sure other browsers do that. It would speed things up.
Sometimes there are 10 or more of these files to fetch.
We already spin off processes to download files in the background, so you'd think the machinery is mostly there.
It's unix only, but I don't care, I don't think I have a single windows user at this point.
I could fork off and download to a temp file and when done read that temp file into the js <script> object and off we go.
But it seems less than ideal. There's just a gut feeling that threads would be better.

2. When a script is marked async, it can run asynchronously.
I don't think we're ever going to do that. I need a team of engineers for that, not just my spare time.
but ... async can mean postpone.
It means the browse can finish and you can start looking at the file while that script runs.
I can just put it on a timer.
Like in 10 seconds go ahead and run the async script.
Ok but in ten seconds you may find yourself locked out for a few seconds while the script runs, which is weird.
And if that script does an xhr, and the internet is slow, then you're blocked for 20 seconds.
Well, what if that postponed script, or perhaps timers in general, ran in another thread while you look around?
Now here's the thing, js + my dom + edbrowse will never be threadsafe.
So if a js timer or script is running over there, you can't run js in the foreground over here.
It runs here or there, not both.
If you try, like clicking on a button or doing anything that involves js, then I have to stop and block wait for the other thread to finish.
But this is not likely, and still better I think that doing all that stuff during browse and you have to wait for it even to see the page.
A lot of these async scripts are google analytics or google ads etc, that we just don't care about,
so let them run when they run, and they might update some lines on the page, filling in the ads, which you can look at if you care, or not.

3. The key to parallel downloads is a threadsafe curl system, not just curl itself but all the machinery we built around it.
I'm not expecting to run js in parallel in two separate threads, but we need to run curl in two separate threads, or in 10 separate threads.
So where does that leave us?

4. There is the basics of running curl in a threadsafe fashion, and maybe Chris can help me with this.
I know you have a 9 to 5, but maybe you remember reading about it and can determine if I have to do something differen, or special,
or if it's just well behaved already.
And what about the standard calls like stdio and such, are they threadsafe?
Remember that each thread could be dumping data to a common file in debug mode.

5. What about the framework around curl, primarily httpConnect?
Good lord there are about 20 static variables at the top of http.c that record values and states and the like as we step through the http fetch.
I mean it's as far away from threadsafe as the moon!

"Did you work at your regular job, then at 10 oclock at night just hammer this thing together as fast as you could, just to make it work?"

"Yeah, I kinda did."

So this push is about cleaning up my mess, at least some of it.
You're a programmer, so you know the drill.
All those static variables become members of a structure,
struct i_get    (internet get)
If a routine calls httpConnect, and this happens more often than you might think, including the xhr request, it has an auto variable
	struct i_get g;
We set it up with url and some parameters, and call httpConnect(&g),
and now everything is on the stack where it belongs.
I've been doing some surfing and it seems to work, but boy there was a lot of rewrite!

This isn't the end. There's the caching system, the web authorization system, finding proxy, the novs domains, I mean it calls things all over that might not be threadsafe.
But some of them are only used in the foreground thread, the interactive thread that is you typing at the keyboard.
And the whole cache system is probably threadsafe, because I set it up with a locking mechanism,
assuming many edbrowse processes would be accessing the same cache.
So these things can be managed one by one I think.
And you know, even if we never do any of the things in this post, it's still a better, cleaner design.

Karl Dahlke

                 reply	other threads:[~2018-03-02 22:34 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180202173524.eklhad@comcast.net \
    --to=eklhad@comcast.net \
    --cc=Edbrowse-dev@lists.the-brannons.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).