edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
From: Adam Thompson <arthompson1990@gmail.com>
To: Karl Dahlke <eklhad@comcast.net>
Cc: Edbrowse-dev@lists.the-brannons.com
Subject: Re: [Edbrowse-dev] tag list
Date: Sun, 2 Mar 2014 19:28:03 +0000	[thread overview]
Message-ID: <20140302192803.GP19851@toaster.adamthompson.me.uk> (raw)
In-Reply-To: <20140202091544.eklhad@comcast.net>

[-- Attachment #1: Type: text/plain, Size: 5368 bytes --]

On Sun, Mar 02, 2014 at 09:15:44AM -0500, Karl Dahlke wrote:
> > At some stage I really need to familiarise myself with the html code.
> 
> Yes, but please ask if unsure. A question can sometimes replace
> days of reading code, especially my code, which isn't well commented.

Tbh it's far from the worst I've seen,
though some more comments on the global and file-scoped variables would be useful.

> > but why are we storing a list of pointers?
> 
> Precisely so the structures don't move.
> Each tag can point, by a pointer, to its parent or children
> and those pointers will remain valid,
> even if c++ vector does a realloc on the list of pointers,
> which it will do as new tags are created.
> Chris and I went through this - I even started writing
> vector<struct htmlTag> code, but then I could see the structures
> were moving, and the parent and child links became invalid.

From a purely optimisation perspective, I wonder if storing vector<htmlTag> and
creating the tree from indices would be better as then the tags would be stored contiguously rather
than all over the heap, thus reducing heap fragmentation.
Of course this increases the possibility of failures due to insufficient
contiguous memory being available. At the end of the day it's not a big problem.

> > We also need to store a list of children in each tag, i.e. in the code:
> 
> Yes, and javascript has set for us a partial standard;
> should we follow it?
[snip]
I'd rather not, I'd just have a list of children in each htmlTag struct,
then write images, divs etc as wrappers that access this list.
Otherwise we get into sub-classing htmlTag which is something I'd like to avoid if possible.

> > Example of <body> <div>
> >
> > The body would have a list of two pointers to the two div tags,
> 
> js already has an array of div tags.
> It is called divs, I think.
> I know it has n array of link tags called links, an array of image tags called images, and so on.
> What I don't know is whether this, in the standard, is a global array of all images on the page,
> or a local array of images in the current structure,
> like elements in a form or options in a select.
> We would want the latter.
> More research is needed.

I'm not sure, however for rendering we need the generic approach above.
The other arrays can be generated from this as necessary.
With some care this shouldn't be too bad performance-wise I think.

> domlink() in jsdom.cpp is suppose to do all of this.
> And it looks like it treats elements and options as a local list,
> in the current structure, but images and links and heads and metas
> and anchors as a global list under document.
> I don't know if this is right.
> If this is the standard perhaps we can do both,
> document/images[] for all image tags on the page,
> and local/images[] for the array of images that are inside the current paragraph
> or whatever.

I'm not sure, I need to read the DOM spec at some stage to work out exactly what's needed.
Remember that there's a core DOM and then the html extensions to this.
I think we need to get our core DOM working,
then look at the html extensions (images[] etc) on top of this.
At the moment, we've got a partial html DOM but not really the core underneath
it so appendChild (core DOM I think) and friends are awkward to implement.
If we fix the core DOM, then the html side of it,
this will hopefully make rendering better,
add support for appendChild and tag creation by JS,
as well as probably removing special-case code.
> Another aspect of this js standard is it is type specific.
> Here is the list of elements in the form, here is the list
> of images, here is the list of anchors, etc.
> Maybe that's ok, but maybe we also need an array of all tags in order
> within each construct.

We definitely need an array of all children.
Remember this isn't so much a js standard,
more js provides an implementation of an interface to the DOM defined by the W3C.
I think if we look at it this way (i.e.
implement our DOM then the js interface) that's probably better.
It also means that when Mozilla totally change SpiderMonkey again (which they
say they may do at any time) we have a working DOM which just needs new wrapping.

To take this to its logical extreme,
I'd like to separate the html parsing and DOM creation from js,
providing an api to this DOM which is capable enough to support all the stuff JS needs.
We could then also write the rendering code using this DOM api as well.
I wonder if there's an html parsing library we can use for some of this.

> > No need to do this rewrite at the moment,
> 
> Absolutely agree.
> I think we all agree here.
> Let's get 3.5.1 stable and working with distributed libraries.
> We're just talking, and thinking, and planning for the future,
> and I think it is helpful.

Yeah. I think future planning's a good idea,
particularly when discussing the kind of changes above.
One discussion we should probably have at some stage in this area is what
systems, language standards etc we want to support.

Personally, I'd kind of like to keep most of edbrowse in C,
with interfaces to whatever we need in whatever language (c++ for SpiderMonkey js for
example), however I know people seem to want to use c++ for various things.

Cheers,
Adam.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  reply	other threads:[~2014-03-02 19:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-02 14:15 Karl Dahlke
2014-03-02 19:28 ` Adam Thompson [this message]
  -- strict thread matches above, loose matches on Subject: below --
2014-03-01 19:24 Karl Dahlke
2014-03-02 13:47 ` Adam Thompson
2014-03-01 14:00 Karl Dahlke
2014-03-01 19:01 ` Adam Thompson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140302192803.GP19851@toaster.adamthompson.me.uk \
    --to=arthompson1990@gmail.com \
    --cc=Edbrowse-dev@lists.the-brannons.com \
    --cc=eklhad@comcast.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).