From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com [IPv6:2a00:1450:400c:c05::22e]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 3FCB87862B for ; Sun, 2 Mar 2014 11:29:15 -0800 (PST) Received: by mail-wi0-f174.google.com with SMTP id f8so2533258wiw.13 for ; Sun, 02 Mar 2014 11:28:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=RKWGKG8RO+/NDe42Pds9AXIYsm8m2VK89uPNfcJ20hU=; b=ds41N00EKtQm1Wmwgr2G/14fKEi1i3W9rf1k9ez5WWnjIWoHaRwjR2AHfDYxXCb7ZB +ou292QIh8kNTof3VOpTk2giDWCvsFVtc8Oz6j2r0LZABHljpOjp75VoYlUgaI9Wgpdn ooeFh0Uww6mscIBz7TTiYi2fwAdnOSDHWbzGkMlNkspCeAZ4dFaV79VKqIMMIvhc4Lx4 TrXMqrkcD28qVTvDfSL7yJIp6AmwBknF1tAi4MMoqDRoDHQrNz6TQ+JRRMuqffgQjCQv xRg8mv7Mv5IQyR9fRede18l3bgJOh1TEn/RnrGX/yDfrexVy1G4Et/Hyk/NWegwwbUNE N3nw== X-Received: by 10.180.79.7 with SMTP id f7mr11170590wix.20.1393788487142; Sun, 02 Mar 2014 11:28:07 -0800 (PST) Received: from toaster.adamthompson.me.uk (toaster.adamthompson.me.uk. [2001:8b0:1142:9042::2]) by mx.google.com with ESMTPSA id br10sm22977025wjb.3.2014.03.02.11.28.05 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sun, 02 Mar 2014 11:28:06 -0800 (PST) Date: Sun, 2 Mar 2014 19:28:03 +0000 From: Adam Thompson To: Karl Dahlke Message-ID: <20140302192803.GP19851@toaster.adamthompson.me.uk> References: <20140202091544.eklhad@comcast.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="b2ktwntdbf0dPnbx" Content-Disposition: inline In-Reply-To: <20140202091544.eklhad@comcast.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Edbrowse-dev@lists.the-brannons.com Subject: Re: [Edbrowse-dev] tag list X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.17 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Mar 2014 19:29:15 -0000 --b2ktwntdbf0dPnbx Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Mar 02, 2014 at 09:15:44AM -0500, Karl Dahlke wrote: > > At some stage I really need to familiarise myself with the html code. >=20 > Yes, but please ask if unsure. A question can sometimes replace > days of reading code, especially my code, which isn't well commented. Tbh it's far from the worst I've seen, though some more comments on the global and file-scoped variables would be = useful. > > but why are we storing a list of pointers? >=20 > Precisely so the structures don't move. > Each tag can point, by a pointer, to its parent or children > and those pointers will remain valid, > even if c++ vector does a realloc on the list of pointers, > which it will do as new tags are created. > Chris and I went through this - I even started writing > vector code, but then I could see the structures > were moving, and the parent and child links became invalid. =46rom a purely optimisation perspective, I wonder if storing vector and creating the tree from indices would be better as then the tags would be st= ored contiguously rather than all over the heap, thus reducing heap fragmentation. Of course this increases the possibility of failures due to insufficient contiguous memory being available. At the end of the day it's not a big pro= blem. > > We also need to store a list of children in each tag, i.e. in the code: >=20 > Yes, and javascript has set for us a partial standard; > should we follow it? [snip] I'd rather not, I'd just have a list of children in each htmlTag struct, then write images, divs etc as wrappers that access this list. Otherwise we get into sub-classing htmlTag which is something I'd like to a= void if possible. > > Example of
> > > > The body would have a list of two pointers to the two div tags, >=20 > js already has an array of div tags. > It is called divs, I think. > I know it has n array of link tags called links, an array of image tags c= alled images, and so on. > What I don't know is whether this, in the standard, is a global array of = all images on the page, > or a local array of images in the current structure, > like elements in a form or options in a select. > We would want the latter. > More research is needed. I'm not sure, however for rendering we need the generic approach above. The other arrays can be generated from this as necessary. With some care this shouldn't be too bad performance-wise I think. > domlink() in jsdom.cpp is suppose to do all of this. > And it looks like it treats elements and options as a local list, > in the current structure, but images and links and heads and metas > and anchors as a global list under document. > I don't know if this is right. > If this is the standard perhaps we can do both, > document/images[] for all image tags on the page, > and local/images[] for the array of images that are inside the current pa= ragraph > or whatever. I'm not sure, I need to read the DOM spec at some stage to work out exactly= what's needed. Remember that there's a core DOM and then the html extensions to this. I think we need to get our core DOM working, then look at the html extensions (images[] etc) on top of this. At the moment, we've got a partial html DOM but not really the core underne= ath it so appendChild (core DOM I think) and friends are awkward to implement. If we fix the core DOM, then the html side of it, this will hopefully make rendering better, add support for appendChild and tag creation by JS, as well as probably removing special-case code. > Another aspect of this js standard is it is type specific. > Here is the list of elements in the form, here is the list > of images, here is the list of anchors, etc. > Maybe that's ok, but maybe we also need an array of all tags in order > within each construct. We definitely need an array of all children. Remember this isn't so much a js standard, more js provides an implementation of an interface to the DOM defined by th= e W3C. I think if we look at it this way (i.e. implement our DOM then the js interface) that's probably better. It also means that when Mozilla totally change SpiderMonkey again (which th= ey say they may do at any time) we have a working DOM which just needs new wra= pping. To take this to its logical extreme, I'd like to separate the html parsing and DOM creation from js, providing an api to this DOM which is capable enough to support all the stu= ff JS needs. We could then also write the rendering code using this DOM api as well. I wonder if there's an html parsing library we can use for some of this. > > No need to do this rewrite at the moment, >=20 > Absolutely agree. > I think we all agree here. > Let's get 3.5.1 stable and working with distributed libraries. > We're just talking, and thinking, and planning for the future, > and I think it is helpful. Yeah. I think future planning's a good idea, particularly when discussing the kind of changes above. One discussion we should probably have at some stage in this area is what systems, language standards etc we want to support. Personally, I'd kind of like to keep most of edbrowse in C, with interfaces to whatever we need in whatever language (c++ for SpiderMon= key js for example), however I know people seem to want to use c++ for various things. Cheers, Adam. --b2ktwntdbf0dPnbx Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJTE4ZDAAoJELZ22lNQBzHOKsAH/R98zRb9EpX1nskJptStOt4w e8l9NlsNNbG/FDN1xBLagAYgs02Y37h47uJPKlU0/riJ2bZQmPQ0bEpc5EBOMUjc 2X61iXDIP4e3eGLOGHR6ZQ4ET2MGbklmES0uasaYz5It89/ViIIUQXuimiGGXyWw 3EUrmFx8pDgqZqqC729cgHDGXjexeJE1aBBWw/jGf9wjCkVfqa6x1OOetcyO2q1k ckWx5sRiD65K/wQqqRZAK0lIBLU35RGu0Gl+uhtNcb1vN6p5TjKtfmsF9sqbRWt4 Ex65vWskPPtEp3jRdUhYWscmgCGoanKyyLtuv8MJO663J6xxOr/K4DDQcfW4aFc= =Qltr -----END PGP SIGNATURE----- --b2ktwntdbf0dPnbx--