edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] stackoverflow and css
@ 2018-02-11 18:53 Karl Dahlke
  2018-02-11 19:25 ` Adam Thompson
  0 siblings, 1 reply; 7+ messages in thread
From: Karl Dahlke @ 2018-02-11 18:53 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 951 bytes --]

The css portion, that maps css attributes over to objects, takes 2 minutes to run.
That's not the infinite loop, but it is intolerable nonetheless.
Browse www.stackoverflow.com with db3 and watch 2 minutes go by between

execute eb$qs$start
execution complete

The version we got was built to run and handle all situations, and be robust, but obviously not optimized.
Optimizing things is something I'm good at, but it's a lot of code doing something I'm not entirely familiar with, so I would be taking a big bite.
Could it be optimized and still be javascript? Is it primarily an algorithmic inefficiency?
Or would it have to be rewritten in C?
I hope the former, because turning all that into C would be a pain!
There's no real reason to mess with the css parser, that runs once and pretty fast,
but querySelectorAll compares every css directive against every node in the document, which is potentially an n^2 problem.

Karl Dahlke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Edbrowse-dev] stackoverflow and css
  2018-02-11 18:53 [Edbrowse-dev] stackoverflow and css Karl Dahlke
@ 2018-02-11 19:25 ` Adam Thompson
  2018-02-11 19:43   ` Adam Thompson
  0 siblings, 1 reply; 7+ messages in thread
From: Adam Thompson @ 2018-02-11 19:25 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Sun, Feb 11, 2018 at 01:53:51PM -0500, Karl Dahlke wrote:
> The css portion, that maps css attributes over to objects, takes 2 minutes to run.
> That's not the infinite loop, but it is intolerable nonetheless.
> Browse www.stackoverflow.com with db3 and watch 2 minutes go by between
> 
> execute eb$qs$start
> execution complete

Wow, yeah... that's not good.

> The version we got was built to run and handle all situations, and be robust, but obviously not optimized.
> Optimizing things is something I'm good at, but it's a lot of code doing something I'm not entirely familiar with, so I would be taking a big bite.
> Could it be optimized and still be javascript? Is it primarily an algorithmic inefficiency?
> Or would it have to be rewritten in C?
> I hope the former, because turning all that into C would be a pain!
> There's no real reason to mess with the css parser, that runs once and pretty fast,
> but querySelectorAll compares every css directive against every node in the document, which is potentially an n^2 problem.

Yeah, that sounds like an algorithm problem, I'll take a look and see if there's anything obvious.
May be there's some way to ignore certain directives, I'm not entirely sure yet.

Thanks for looking at this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Edbrowse-dev] stackoverflow and css
  2018-02-11 19:25 ` Adam Thompson
@ 2018-02-11 19:43   ` Adam Thompson
  2018-02-11 21:03     ` Karl Dahlke
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Adam Thompson @ 2018-02-11 19:43 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Sun, Feb 11, 2018 at 07:25:31PM +0000, Adam Thompson wrote:
> On Sun, Feb 11, 2018 at 01:53:51PM -0500, Karl Dahlke wrote:
> > The css portion, that maps css attributes over to objects, takes 2 minutes to run.
> > That's not the infinite loop, but it is intolerable nonetheless.
> > Browse www.stackoverflow.com with db3 and watch 2 minutes go by between
> > 
> > execute eb$qs$start
> > execution complete
> 
> Wow, yeah... that's not good.
> 
> > The version we got was built to run and handle all situations, and be robust, but obviously not optimized.
> > Optimizing things is something I'm good at, but it's a lot of code doing something I'm not entirely familiar with, so I would be taking a big bite.
> > Could it be optimized and still be javascript? Is it primarily an algorithmic inefficiency?
> > Or would it have to be rewritten in C?
> > I hope the former, because turning all that into C would be a pain!
> > There's no real reason to mess with the css parser, that runs once and pretty fast,
> > but querySelectorAll compares every css directive against every node in the document, which is potentially an n^2 problem.
> 
> Yeah, that sounds like an algorithm problem, I'll take a look and see if there's anything obvious.
> May be there's some way to ignore certain directives, I'm not entirely sure yet.
> 
> Thanks for looking at this.

Ok, just had a quick look through the code, and my js is... not great...
but I wonder if we could (may be lazily when we first traverse looking for selectors) hash element references based on selectors.  That *should* save traversals in subsequent calls to querySelectorAll.
That being said, I'm not entirely sure I can understand how that code does its job so I may be missing something important there.

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Edbrowse-dev] stackoverflow and css
  2018-02-11 19:43   ` Adam Thompson
@ 2018-02-11 21:03     ` Karl Dahlke
  2018-02-11 21:24     ` Karl Dahlke
  2018-02-11 23:08     ` Karl Dahlke
  2 siblings, 0 replies; 7+ messages in thread
From: Karl Dahlke @ 2018-02-11 21:03 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2244 bytes --]

Ok, querySelectorAll was built to run once and work, without regard to the fact that it might run thousands of times over.
Imagine a routine querySelectorAll_prep that traverses the document tree and does things like

hashnodes[e.nodeName].push(e);

So all the div nodes are in the array hashnodes["div"}.
If you have a div selector then just run down this array.
But more common is nodetype with class.

hashnodes[e.nodeName + "||" + e.className].push(e);

If the selector is div.class then you just scoot down a very short array.
Beyond this, there are no checks needed for an unadorned div.class, just apply the attributes.
Even if there are other checks, like div.class and foo=bar, at least you have a short array to check.
You might also set

hashnodes["||" + e.className].push(e);

That lets you look for .xyz i.e. any element of class xyz.
And you need to put every element in hashnodes["*"] because some selectors are just exotic and not tied to a particular node type or class.

But it's a little more complicated.
A selector could be

p.dog,div.cat

That's two selectors in one.
We would have to separate them, split(",");
and iterate over the two selectors, as though they had been separate from the get-go.
Not hard, just another wrinkle.

As I do today, getComputedStyle needs to run querySelectorAll on one particular node, not needing any of the above machinery.
That is the second argument to querySelectorAll as it stands today, a particular node to start at,
but it tries to do the subtree, so I remove the children first, then run querySelectorAll(selectors, e), then put the children back. It's gross.
See third.js line 3947.

Before you or I or anyone takes a whack at this, I'm going to list out all the selectors from stackoverflow,
to get an idea of what is there, and what kind of savings we might realize from the aforementioned node hashing.

Glad to see you back on the list, we miss your talent and your insights.
Maybe you could tackle one of these projects, if you have time,
gopher or querySelectorAll optimization or infinite loop debugging,
whichever one tickles your fancy.
Or anyone else who wants to step up.
I'll keep doing some light research.

Karl Dahlke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Edbrowse-dev] stackoverflow and css
  2018-02-11 19:43   ` Adam Thompson
  2018-02-11 21:03     ` Karl Dahlke
@ 2018-02-11 21:24     ` Karl Dahlke
  2018-02-13 19:05       ` Adam Thompson
  2018-02-11 23:08     ` Karl Dahlke
  2 siblings, 1 reply; 7+ messages in thread
From: Karl Dahlke @ 2018-02-11 21:24 UTC (permalink / raw)
  To: Edbrowse-dev

If you put

nojs = js-sec.indexww.com

in your config file then stackoverflow browses; we have excised the js with the infinite loop.
Once browsed, I can drop into jdb and list out all the selectors in cssList.
There are 3198 of them.
I have put them on my website for your review.

www.eklhad.net/csslist

Almost all of them specify node.class, and would benefit from the hashing mechanism described in my previous email.

Karl Dahlke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Edbrowse-dev] stackoverflow and css
  2018-02-11 19:43   ` Adam Thompson
  2018-02-11 21:03     ` Karl Dahlke
  2018-02-11 21:24     ` Karl Dahlke
@ 2018-02-11 23:08     ` Karl Dahlke
  2 siblings, 0 replies; 7+ messages in thread
From: Karl Dahlke @ 2018-02-11 23:08 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]

It's worse than we thought.
I always like to build a local test.
Download www.eklhad.net/csstest.zip and review.

over.css: the selectors from stackoverflow.com, this is their css file.
csstest: the file to browse.

Browse with db3 of course.
eb$qs$start comes back in jig time, which suggests the slowdown is in traversing and testing all the nodes, not in the parsing.
This test file only has 6 nodes, as shown by dumptree(document).
Problems listed below.

1. There is a syntax error.
Their software doesn't allow a classname to start with a dash, and yet it can, I guess.
You see the syntax error with db3.

2. Their program stops at the first error. So it only did part of the list.

3. This error occurs at 471, out of 3198.
If it had gone all the way through, on stackoverflow.com, this routine would take 2 times 3198 / 471 = 13.5 minutes.
So we can rightfully think of a 13.5 minute startup cost just to browse this site.
Obviously we can't live with that.

There are so many issues I almost think we should rewrite it from scratch, either in C or js.
Obviously I'd use theirs as a guide, but lordy! IDK

Karl Dahlke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Edbrowse-dev] stackoverflow and css
  2018-02-11 21:24     ` Karl Dahlke
@ 2018-02-13 19:05       ` Adam Thompson
  0 siblings, 0 replies; 7+ messages in thread
From: Adam Thompson @ 2018-02-13 19:05 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Sun, Feb 11, 2018 at 04:24:46PM -0500, Karl Dahlke wrote:
> If you put
> 
> nojs = js-sec.indexww.com
>
> in your config file then stackoverflow browses; we have excised the js with the infinite loop.

Thanks.

> Once browsed, I can drop into jdb and list out all the selectors in cssList.
> There are 3198 of them.
> I have put them on my website for your review.
> 
> www.eklhad.net/csslist
> 
> Almost all of them specify node.class, and would benefit from the hashing mechanism described in my previous email.

Wow... yeah, that's a *lot* of selectors.
The hashing mechanism sounds like a good idea, should make things much more efficient.

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-13 19:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-11 18:53 [Edbrowse-dev] stackoverflow and css Karl Dahlke
2018-02-11 19:25 ` Adam Thompson
2018-02-11 19:43   ` Adam Thompson
2018-02-11 21:03     ` Karl Dahlke
2018-02-11 21:24     ` Karl Dahlke
2018-02-13 19:05       ` Adam Thompson
2018-02-11 23:08     ` Karl Dahlke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).