edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] startwindow / class NodeList
@ 2015-08-10  8:56 Kevin Carhart
  2015-08-11 21:38 ` Adam Thompson
  0 siblings, 1 reply; 19+ messages in thread
From: Kevin Carhart @ 2015-08-10  8:56 UTC (permalink / raw)
  To: Edbrowse-dev



Hi guys

I was familiarizing myself with DOM discussions on the listserv over 
2014-15, so that I can hopefully contribute and be up to date on 
recent work when trying to figure out how.  I've installed from github, 
and have been taking a look at the startwindow.js.  Maybe I could help 
build out some pieces of DOM?

I noticed something in past DOM discussions on the list.  A while ago, 
Adam said, "Also, as per comments in startwindow.js, apparently we're 
supposed to return something called a node list object rather than an 
array from getElement* functions. No idea what one of these is yet, more 
future research I think."

You may know this already, but the env.js code has a specification for 
NodeList!  I put it up at:
http://carhart.net/~kevin/nodelist.js

Adam, you said you had taken a look at env.js, right?  I know it isn't 
suitable as an implementation out of the box, but do you think env is 
useable as a guide, for example if I was to try and adapt 
document.getElementsByTagName to return a NodeList?

I notice in the env code that they cite URLs from w3, which makes me 
hopeful that they have done good generic work that just happens to be in 
javascript.  Here's what they have to say about document and about the 
'Node' class,
http://carhart.net/~kevin/document.js
http://carhart.net/~kevin/node.js

thanks!
Kevin

--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] startwindow / class NodeList
  2015-08-10  8:56 [Edbrowse-dev] startwindow / class NodeList Kevin Carhart
@ 2015-08-11 21:38 ` Adam Thompson
  2015-08-12  0:15   ` Karl Dahlke
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Thompson @ 2015-08-11 21:38 UTC (permalink / raw)
  To: Kevin Carhart; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2848 bytes --]

Hi Kevin,

On Mon, Aug 10, 2015 at 01:56:27AM -0700, Kevin Carhart wrote:
> 
> I was familiarizing myself with DOM discussions on the listserv over
> 2014-15, so that I can hopefully contribute and be up to date on recent work
> when trying to figure out how.  I've installed from github, and have been
> taking a look at the startwindow.js.  Maybe I could help build out some
> pieces of DOM?
> 
> I noticed something in past DOM discussions on the list.  A while ago, Adam
> said, "Also, as per comments in startwindow.js, apparently we're supposed to
> return something called a node list object rather than an array from
> getElement* functions. No idea what one of these is yet, more future
> research I think."
> 
> You may know this already, but the env.js code has a specification for
> NodeList!  I put it up at:
> http://carhart.net/~kevin/nodelist.js
> 
> Adam, you said you had taken a look at env.js, right?  I know it isn't
> suitable as an implementation out of the box, but do you think env is
> useable as a guide, for example if I was to try and adapt
> document.getElementsByTagName to return a NodeList?

I took a look, but it seemed heavily bound to a specific js engine when I looked.
Part of the problem we'd have with using a js DOM like this is that there's
currently very little connection between the js objects and the rendered page.
This isn't easy to solve, and the DOM spec seems to indicate that large parts
of it are expected to be implemented directly by the host, *not* in javascript.
Thus, I'm actualy considering pulling some stuff *out of* startwindow.js and
putting it back into C as the alternative is trying to hook up custom js
objects to our renderer which I can't imagine working terribly well.

> I notice in the env code that they cite URLs from w3, which makes me hopeful
> that they have done good generic work that just happens to be in javascript.
> Here's what they have to say about document and about the 'Node' class,
> http://carhart.net/~kevin/document.js
> http://carhart.net/~kevin/node.js

I'll certainly have a look at all this again though and see what we can use,
but I really think we'll just need to suck it up and do the work in our c
code rather than the js engine if we ever want to make anything robust.
The current situation's bad enough, with some functions in startwindow.js
having seemingly *no* impact on the rendered text dispite the fact that they really should.

There's also the mention of host objects which I came across when reading about DOM.
These sound like they're exposed to js as objects but are designed not to be js
objects in reality so, whereas in most situations they behave just like a
standard object, certain parts can't be changed (prototypes I think).
Again, more research required.

Cheers,
Adam.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Edbrowse-dev]  startwindow / class NodeList
  2015-08-11 21:38 ` Adam Thompson
@ 2015-08-12  0:15   ` Karl Dahlke
  2015-08-12 19:55     ` Kevin Carhart
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Dahlke @ 2015-08-12  0:15 UTC (permalink / raw)
  To: Edbrowse-dev

Hi Kevin,

I haven't looked at your websites, my bad,
but let me just say for now that we could always use an extra hand
in taking the next step, as it will probably be a fairly large step.
Adam is somewhat pessimistic about dom functionality in a startup js script,
I am more optimistic about it, or I might say, hopeful / wishful,
since it is so much easier to implement and maintain and understand there,
rather than the C world with its engine specific code etc.
But wishful is not always right, is it.

Even if startwindow.js is temporary, it sure has brought us a lot of
functionality for not a lot of code,
to help us see the path ahead, maybe like a prototype.
If 100 lines of that has to be native code, it probably becomes 600 lines of C.
That's the currency exchange rate.
But if that's where it belongs then so be it.

Another dimension is html parsing and the building of a corresponding tree of node objects
and corresponding js objects.
Chris is looking into tidy5 to help us do that, rather than my home grown parser.
Things usually get better when we move away from my home grown code.     :)

Then there is the use of alternate js engines, besides Mozilla 24,
which works but is a bit clunky.
Adam is looking into this.

I think we all have a lot of other stuff going on in life,
which has slowed us down a bit of late.
Now if I could just clone myself...

Karl Dahlke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] startwindow / class NodeList
  2015-08-12  0:15   ` Karl Dahlke
@ 2015-08-12 19:55     ` Kevin Carhart
  2015-08-12 20:56       ` Karl Dahlke
  0 siblings, 1 reply; 19+ messages in thread
From: Kevin Carhart @ 2015-08-12 19:55 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev



Hi Adam and Karl,

Thanks for writing, I take your points about where to do what.

I see the points about the back and forth, can we, can't we.  I'm not 
married to writing javascript so maybe I could help build out some 
pieces of DOM in C when the time comes.  I remember the tidy5 thread and I 
agree that this will be huge, because doesn't this mean you will have an 
easier time from now on, doing the types of translations that jSyncup 
does?  So tidy5 is a big deal!  Maybe I should wait a while and see how 
the landscape for how to work changes with the parser?

> Even if startwindow.js is temporary, it sure has brought us a lot of
> functionality for not a lot of code,
> to help us see the path ahead, maybe like a prototype.

I know what you mean.  In order to get over the chicken and egg problem of 
not knowing anything and not knowing how to find out, I did something like 
this and compiled a very marked-up edbrowse as a dynamic harness for 
loading a web page over and over and playing along with what 
happens.  What I've observed is that then it raises a question around 
generality and specificity: so I'm knee-deep in the real world, weird code 
of a live website (like amazon, or yellowpages or yelp), how do I know if 
the situations I'm seeing will come up enough generally for the effort to 
have been worth it?  Or does this uncertainty mean that the 
reverse-engineering technique is extraneous?  Anyway, I did some of this.. 
it's like Karl has written from time to time - you also reverse particular 
pages because you are trying to get that page working for yourself in the 
short term.

> I think we all have a lot of other stuff going on in life,
> which has slowed us down a bit of late.
I understand.. I have set it down for long periods.  I picked edbrowse 
back up about a month ago and have been a bit addicted since then.

thanks
Kevin

PS As a short aside, Karl, you have periodically said, I don't foresee 
anyone writing journal articles about us.  I wonder if someone like Norton 
or McAfee would show some interest.  Have you considered trying to 
interest the anti-malware world in edbrowse as a safe quarantine or 
forensic environment for a page containing bad javascript?




--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Edbrowse-dev]   startwindow / class NodeList
  2015-08-12 19:55     ` Kevin Carhart
@ 2015-08-12 20:56       ` Karl Dahlke
  2015-08-13  1:08         ` Chris Brannon
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Dahlke @ 2015-08-12 20:56 UTC (permalink / raw)
  To: Edbrowse-dev

> Maybe I should wait a while

Well I hate to miss out on your time and talent.
Volunteer developers for this niche project are scarce indeed.
As a semi-coordinater, might I ask / suggest that perhaps
you look into the tidy5 translation, which is something we all agree on.
Chris was going to work on it but that was a while ago
so he may be involved in other things.
He's on this list so I'll let him chime in here.
But if you want you could work on that,
and Chris and I could look into imap or other orthogonal tasks
as time permits.
I sent Chris an email that he could forward to you
about html.c and the parser and hooks and etc,
I would send it to you but can't seem to find it right now.

> I wonder if someone like Norton or McAfee would show some interest.

I think so, but probably we need a better js dom implementation,
that works with almost all websites, as Adam keeps saying,
and he's right.
When this browser is on par with others we can and should promote it.

Karl Dahlke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] startwindow / class NodeList
  2015-08-12 20:56       ` Karl Dahlke
@ 2015-08-13  1:08         ` Chris Brannon
  2015-08-13  4:36           ` [Edbrowse-dev] tidy5 Kevin Carhart
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Brannon @ 2015-08-13  1:08 UTC (permalink / raw)
  To: Edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> As a semi-coordinater, might I ask / suggest that perhaps
> you look into the tidy5 translation, which is something we all agree on.
> Chris was going to work on it but that was a while ago
> He's on this list so I'll let him chime in here.

I just keep having false starts every time I look at this.
If you want to work on this all by yourself, feel free, and I'm happy to
forward the mail I have.  Or we can collaborate on this together, if
you'd prefer that.

-- Chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Edbrowse-dev] tidy5
  2015-08-13  1:08         ` Chris Brannon
@ 2015-08-13  4:36           ` Kevin Carhart
  2015-08-13 20:07             ` Adam Thompson
  0 siblings, 1 reply; 19+ messages in thread
From: Kevin Carhart @ 2015-08-13  4:36 UTC (permalink / raw)
  To: Chris Brannon; +Cc: Edbrowse-dev




Hi all

This sounds great-  thanks for the suggestion.  I hope the software works 
for our purposes.

> forward the mail I have.  Or we can collaborate on this together, if

Yes, whatever works, thanks Chris!  Please let me know your findings so 
far.

Kevin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] tidy5
  2015-08-13  4:36           ` [Edbrowse-dev] tidy5 Kevin Carhart
@ 2015-08-13 20:07             ` Adam Thompson
  2015-08-14  0:54               ` Kevin Carhart
  2015-08-14  3:37               ` Karl Dahlke
  0 siblings, 2 replies; 19+ messages in thread
From: Adam Thompson @ 2015-08-13 20:07 UTC (permalink / raw)
  To: Kevin Carhart; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2222 bytes --]

On Wed, Aug 12, 2015 at 09:36:51PM -0700, Kevin Carhart wrote:
> 
> 
> Hi all
> 
> This sounds great-  thanks for the suggestion.  I hope the software works
> for our purposes.
> 
> >forward the mail I have.  Or we can collaborate on this together, if
> 
> Yes, whatever works, thanks Chris!  Please let me know your findings so far.

I'm also happy to help if there's something I can do.
I know I said I'd look into a new js engine,
but I really think we need to get the html and DOM stuff sorted before that.

In terms of an architecture I'm thinking of aiming to have the DOM as an
abstraction which can be used by both the rendering code and the js. Thus:
html is parsed into a node tree which is converted to our DOM objects
These objects are exposed to js via wrapper objects in the js world such that
any changes js makes are automatically passed through to the DOM
The renderer renders the DOM automatically on page load,
with support for re-rendering on a user command (with some sort of
notifications for js induced changes)
Form fields are altered in the DOM, which may or may not trigger a re-rendering
Any re-rendering would be partial, i.e.
only the changed segments of the DOM are re-rendered

This is going to be a *lot* of work and I don't expect it to all be done at
once, but that's certainly where I think we should be headed. Any thoughts?

As for Edbrowse being used in cyber security,
this isn't a good idea since most systems which analyse web pages for threats
use highly advanced techniques to scan for malware which don't involve
executing the javascript directly, and any such execution would probably
require analysis on the js engine level to detect suspicious behaviours.
None of these tasks would be possible with Edbrowse,
and altering it to make such things possible would mean we weren't writing a
web browser any more.
That's before we get into the security of the browser itself,
which probably could do with some careful analysis at some stage anyway,
particularly as we plan on making this a larger project.

However, I can see a definite place for Edbrowse for page automation etc once
we are more standards compliant.

Cheers,
Adam.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] tidy5
  2015-08-13 20:07             ` Adam Thompson
@ 2015-08-14  0:54               ` Kevin Carhart
  2015-08-14  3:45                 ` Karl Dahlke
  2015-08-14  3:37               ` Karl Dahlke
  1 sibling, 1 reply; 19+ messages in thread
From: Kevin Carhart @ 2015-08-14  0:54 UTC (permalink / raw)
  To: Edbrowse-dev



> I know I said I'd look into a new js engine,
> but I really think we need to get the html and DOM stuff sorted before that.
>
> In terms of an architecture I'm thinking of aiming to have the DOM as an
> abstraction which can be used by both the rendering code and the js. Thus:

Thanks Adam!  Exciting.

OK, so far I compiled the tidy code and ran their sample program with 
libtidy calls.  The possibility of interoperability is very cool.
Am I on the right track in thinking, well tidy has a central "switch-case" 
section over various tag types, and we have a central "switch-case" in 
encodeTags, so this would be the place where you bring in tidy calls?

For methodology of how to proceed, I am happy with any & all methods.  I 
don't write C professionally but I know some parts of the edbrowse source 
pretty well at this point.  At least I'm now on a first-name basis with 
encodeTags.  (javaParseExecute and I used to babysit each others' 
children.)

> As for Edbrowse being used in cyber security,
> this isn't a good idea since most systems which analyse web pages for threats
> use highly advanced techniques to scan for malware which don't involve
> executing the javascript directly, and any such execution would probably
> require analysis on the js engine level to detect suspicious behaviours.

Ahhhh, I see.  That makes sense.

Kevin


--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Edbrowse-dev]  tidy5
  2015-08-13 20:07             ` Adam Thompson
  2015-08-14  0:54               ` Kevin Carhart
@ 2015-08-14  3:37               ` Karl Dahlke
  2015-08-16 18:10                 ` Adam Thompson
  1 sibling, 1 reply; 19+ messages in thread
From: Karl Dahlke @ 2015-08-14  3:37 UTC (permalink / raw)
  To: Edbrowse-dev

> In terms of an architecture I'm thinking of aiming to have the DOM as an
> abstraction which can be used by both the rendering code and the js. Thus:
> html is parsed into a node tree which is converted to our DOM objects
> These objects are exposed to js via wrapper objects in the js world such that
> any changes js makes are automatically passed through to the DOM
> The renderer renders the DOM automatically on page load,
> with support for re-rendering on a user command (with some sort of
> notifications for js induced changes)
> Form fields are altered in the DOM, which may or may not trigger a re-rendering

Yes this can cause a rerender, example onchange or onselect code,
as exercised by the regression tests in jsrt.

> Any re-rendering would be partial, i.e.
> only the changed segments of the DOM are re-rendered

This sounds like a diff between the old dom and the new,
but it's easier to just rerender and then diff the old buffer against the new,
and then report the lines that have changed, which is how edbrowse works today.
Realize that a small change in dom could change the buffer
on down the page, even into dom elements that have not changed.
So I think you always want to just call render() and then
diff the two buffers.
Maybe even a diff library we can use, if not /bin/diff itself.

These are minor points; and you are definitely on track.
This is where we need to be.

Karl Dahlke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Edbrowse-dev]  tidy5
  2015-08-14  0:54               ` Kevin Carhart
@ 2015-08-14  3:45                 ` Karl Dahlke
  2015-08-14 20:17                   ` Chris Brannon
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Dahlke @ 2015-08-14  3:45 UTC (permalink / raw)
  To: Edbrowse-dev

> Am I on the right track in thinking, well tidy has a central "switch-case"
> section over various tag types, and we have a central "switch-case" in

Forgive me I haven't looked at the code at all,
but I would guess there's a tidy5 encodeTags() that takes the
html text and makes the tree.
We would just call that instead of our encodeTags(),
thus slicing out all that home grown html parsing code that I wrote,
I don't want to be in that business any more.
We would then follow up with software to traverse their node tree
and build our node tree.
The new tree will have more nodes than ours does today,
a node for every tag, not just some tags,
a note for each block of text, a node for each html comment.
So a lot more nodes, but perhaps somewhat backward compatible
with what we have today, at least for the first pass,
at least to get us going.
Then we improve and improve and improve.


Karl Dahlke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] tidy5
  2015-08-14  3:45                 ` Karl Dahlke
@ 2015-08-14 20:17                   ` Chris Brannon
  2015-08-16  5:54                     ` Kevin Carhart
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Brannon @ 2015-08-14 20:17 UTC (permalink / raw)
  To: Edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> Forgive me I haven't looked at the code at all,
> but I would guess there's a tidy5 encodeTags() that takes the
> html text and makes the tree.

Essentially true.  The details are a bit more complicated, but this is
the idea.  We call tidy to parse the html, and we get back a structure
from tidy called a document.  It contains our tree of nodes, and we
can iterate over it.  The problem is, this is a usable parse tree for
the html, but it isn't a true DOM.  We can remove nodes and attributes
from the tree, but we can't add them.  That causes problems for JS that
needs to add new nodes.  So we're going to have to take that parse tree
we get back from Tidy5, build our own DOM out of it, and eventually
render it.

-- Chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] tidy5
  2015-08-14 20:17                   ` Chris Brannon
@ 2015-08-16  5:54                     ` Kevin Carhart
  2015-08-16 10:38                       ` Karl Dahlke
  0 siblings, 1 reply; 19+ messages in thread
From: Kevin Carhart @ 2015-08-16  5:54 UTC (permalink / raw)
  To: Chris Brannon; +Cc: Edbrowse-dev



Chris said:
> Essentially true.  The details are a bit more complicated, but this is
> the idea.  We call tidy to parse the html, and we get back a structure
> from tidy called a document.  It contains our tree of nodes, and we
> can iterate over it.  The problem is, this is a usable parse tree for
> the html, but it isn't a true DOM.  We can remove nodes and attributes
> from the tree, but we can't add them.  That causes problems for JS that
> needs to add new nodes.  So we're going to have to take that parse tree
> we get back from Tidy5, build our own DOM out of it, and eventually
> render it.

So the switch statement over (action) goes away, and the painstaking 
character-by-character tag recognition goes away, but maybe in return we need 
a switch statement with handling for every value, maybe grouped together 
in cases, that they list as "Known HTML element types" in the tidyenum.h 
file?

And would some or most of the old case blocks be preserved, such as:
the old case TAGACT_TABLE might resemble a new case TidyTag_TABLE
the old case TAGACT_TR might resemble a new case TidyTag_TR
...

Like you get some work done by the library, but also want a crack at these 
node types differentiated by what they are.  Is that correct?  We're still 
building the new string 'ns'.. hmmm... is more standardization possible, 
or do you still have to do a variety of things in order to add to ns 
properly?



---
Here is a second note on what Karl said (paraphrasing), as a first step, 
how about bringing libtidy into html.c, run their parse method and just 
bring the output around as part of the ebWindow struct, for further 
examination without breaking what now exists.

My note on this.  I went to eb.h to see what would happen if I included 
tidy.h and added a TidyDoc to the ebWindow struct.  Interestingly, because 
of includes from includes, there is a name collision when I try to 
compile.. I think... over mkdir in plugin.c and mkdir in 
/usr/include/sys/stat.h.  Uh, maybe it's a client thing though. 
Disregard if it doesn't sound salient..

thanks.. this is fun.. I hope tidy will work
Kevin



--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Edbrowse-dev]  tidy5
  2015-08-16  5:54                     ` Kevin Carhart
@ 2015-08-16 10:38                       ` Karl Dahlke
  0 siblings, 0 replies; 19+ messages in thread
From: Karl Dahlke @ 2015-08-16 10:38 UTC (permalink / raw)
  To: Edbrowse-dev

> And would some or most of the old case blocks be preserved, such as:
> the old case TAGACT_TABLE might resemble a new case TidyTag_TABLE

Yes I'm sure we would need to do that,
but I would save all that for step 2, step 1 is just calling tidy
and holding the resulting tree in-window,
until the window is freed.

> We're still building the new string 'ns'.. hmmm...

Let ns build as it does today in step 1, but by step 2
a routine render(), perhaps in render.c, will build it by traversing our dom tree.
So we will need to catch and retain text nodes, which aren't even part of our world today.
We have some tag nodes, but no text nodes.

> there is a name collision  ... mkdir

I found the same collision when I tried to recompile an old math program
I wrote 15 years ago.
The call use to be mkdir(file), now, in most libraries,
mkdir(file, mode), yet sometimes mkdir(file) works anyways,
sometimes not.
I'll check into this and most likely change to the second form,
which will most likely fix the problem.
Notice mkdir has the second form in main.c.

> thanks.. this is fun..

It is, but a bit concerning in that I don't know if tidy5
will be maintained long term, but if not, worst case,
we can take it over which is better than writing our own html parser
from scratch as we were doing.
Hurray for open source.

Karl Dahlke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] tidy5
  2015-08-14  3:37               ` Karl Dahlke
@ 2015-08-16 18:10                 ` Adam Thompson
  0 siblings, 0 replies; 19+ messages in thread
From: Adam Thompson @ 2015-08-16 18:10 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1985 bytes --]

On Thu, Aug 13, 2015 at 11:37:54PM -0400, Karl Dahlke wrote:
> > In terms of an architecture I'm thinking of aiming to have the DOM as an
> > abstraction which can be used by both the rendering code and the js. Thus:
> > html is parsed into a node tree which is converted to our DOM objects
> > These objects are exposed to js via wrapper objects in the js world such that
> > any changes js makes are automatically passed through to the DOM
> > The renderer renders the DOM automatically on page load,
> > with support for re-rendering on a user command (with some sort of
> > notifications for js induced changes)
> > Form fields are altered in the DOM, which may or may not trigger a re-rendering
> 
> Yes this can cause a rerender, example onchange or onselect code,
> as exercised by the regression tests in jsrt.

Yeah, that was my thinking.

> > Any re-rendering would be partial, i.e.
> > only the changed segments of the DOM are re-rendered
> 
> This sounds like a diff between the old dom and the new,
> but it's easier to just rerender and then diff the old buffer against the new,
> and then report the lines that have changed, which is how edbrowse works today.
> Realize that a small change in dom could change the buffer
> on down the page, even into dom elements that have not changed.
> So I think you always want to just call render() and then
> diff the two buffers.
> Maybe even a diff library we can use, if not /bin/diff itself.

Yeah, I guess I'm just concerned about the js intensive pages which are
becoming much more common taking a long time to re-render,
but I can se that always doing a full re-render is the easiest and probably most
robust approach.
I'd quite like to have some smart approach to avoid making copies of unchanged
buffer lines whilst rendering, but I'm not too sure how that'd work.
> These are minor points; and you are definitely on track.
> This is where we need to be.

Thanks.

Regards,
Adam.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] tidy5
  2015-02-03 22:15 Karl Dahlke
@ 2015-02-03 23:41 ` Adam Thompson
  0 siblings, 0 replies; 19+ messages in thread
From: Adam Thompson @ 2015-02-03 23:41 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2339 bytes --]

On Tue, Feb 03, 2015 at 05:15:17PM -0500, Karl Dahlke wrote:
> > if tidy5 builds as a library (I hope it does)
> > then we build and install their code same as any other library.
> 
> Sure, that makes sense.
> So we'll see who has a chunk of time first to look into this.

I may have some time this weekend to have a look.
> If said library swallows html and gives us a tree of nodes, should we:
> 
> A) convert those nodes into the struct htmlTags we have today
> and use most of our existing machinery, incremental, or
> 
> b) follow that tree directly, build js nodes off of it, use those nodes,
> don't use our structures any more, this more of a rewrite.
> 
> I'm not expecting an answer, because we'd probably have to look
> at the code, and library, and resulting tree to answer the question.
> Just something to chew on.

From what I've seen of tidy (based on the curl example code),
I'd go for something between the two.
Basically I'm thinking that our existing tag machinary needs work anyway,
so we'd want to adapt it, but then we'd follow the tidy-generated tree,
building our DOM based on that.
We'd then have js hooks into our DOM which allow js to alter it since I suspect
doing that with the tidy tree would be somewhat mor involved with the tidy
code-base than we want to get.
This also gives us greater flexibility in DOM implementation.
Once js's finished with the DOM, we'd then render it.
This logic would have to be repeated each time js alters things (with some
optimisations) to ensure we get an accurate representation of what js's done to the page.

On another DOM-related note, it seems that we probably need to move the
contents of startwindow.js out of js and into our DOM implementation since DOM
objects are supposed to be "host" objects rather than javascript objects as
we're implementing them.
This is another reason to get a fully functional c DOM implementation,
since then we can plug in tidy5 to generate the initial structure and js to do
its thing, whilst allowing the js stuff to be in c++ and the tidy5 stuff to be
in whatever we need it to be.
This isn't going to be incremental, but at some stage I think we need to just do it and
stop making things "just work" for a few days until the next thing which "just
doesn't work" pops up.

Cheers,
Adam.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Edbrowse-dev]  tidy5
@ 2015-02-03 22:15 Karl Dahlke
  2015-02-03 23:41 ` Adam Thompson
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Dahlke @ 2015-02-03 22:15 UTC (permalink / raw)
  To: Edbrowse-dev

> if tidy5 builds as a library (I hope it does)
> then we build and install their code same as any other library.

Sure, that makes sense.
So we'll see who has a chunk of time first to look into this.

If said library swallows html and gives us a tree of nodes, should we:

A) convert those nodes into the struct htmlTags we have today
and use most of our existing machinery, incremental, or

b) follow that tree directly, build js nodes off of it, use those nodes,
don't use our structures any more, this more of a rewrite.

I'm not expecting an answer, because we'd probably have to look
at the code, and library, and resulting tree to answer the question.
Just something to chew on.

Karl Dahlke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Edbrowse-dev] tidy5
  2015-02-02 19:58 Karl Dahlke
@ 2015-02-03 21:18 ` Adam Thompson
  0 siblings, 0 replies; 19+ messages in thread
From: Adam Thompson @ 2015-02-03 21:18 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]

On Mon, Feb 02, 2015 at 02:58:35PM -0500, Karl Dahlke wrote:
> I'm trying to get my head around this, and one problem is I don't
> know hardly anything about git.
> If we wanted to use and follow the tidy5 package, how would we do it?
> Could we, or should we,
> git clone the package under src, so there is then an src/tidy
> directory, that would build via make,
> that we could fold into our product and build upon?
> Could we git pull from them to keep up to date with them,
> and continue to do our work on top of it?
> Or is it impossible to put one git structure beneath another?

It's probably possible, but a *really* bad idea imo for a whole number of reasons.
> If that doesn't work then what is the mechanics of following and incorporating
> another project in ours?

Well.... at the risk of stating the obvious,
if tidy5 builds as a library (I hope it does)
then we build and install their code same as any other library.
We document the requirement for whatever version we need and go from there.

No need for nesting git repos or anything like that.
If we *really need* to fork (why?) then we copy the changes between their code
and ours and take on all the pain associated with this,
but lets not unless anyone has a good reason why (like my assumption about the
librified nature of the code is incorrect).
Even then, I'd much rather take on the work of making a libtidy5 and
contributing it to the tidy5 project and then proceeding with the library
integration.

Cheers,
Adam.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Edbrowse-dev] tidy5
@ 2015-02-02 19:58 Karl Dahlke
  2015-02-03 21:18 ` Adam Thompson
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Dahlke @ 2015-02-02 19:58 UTC (permalink / raw)
  To: Edbrowse-dev

I'm trying to get my head around this, and one problem is I don't
know hardly anything about git.
If we wanted to use and follow the tidy5 package, how would we do it?
Could we, or should we,
git clone the package under src, so there is then an src/tidy
directory, that would build via make,
that we could fold into our product and build upon?
Could we git pull from them to keep up to date with them,
and continue to do our work on top of it?
Or is it impossible to put one git structure beneath another?
If that doesn't work then what is the mechanics of following and incorporating
another project in ours?

Karl Dahlke

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-08-16 18:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-10  8:56 [Edbrowse-dev] startwindow / class NodeList Kevin Carhart
2015-08-11 21:38 ` Adam Thompson
2015-08-12  0:15   ` Karl Dahlke
2015-08-12 19:55     ` Kevin Carhart
2015-08-12 20:56       ` Karl Dahlke
2015-08-13  1:08         ` Chris Brannon
2015-08-13  4:36           ` [Edbrowse-dev] tidy5 Kevin Carhart
2015-08-13 20:07             ` Adam Thompson
2015-08-14  0:54               ` Kevin Carhart
2015-08-14  3:45                 ` Karl Dahlke
2015-08-14 20:17                   ` Chris Brannon
2015-08-16  5:54                     ` Kevin Carhart
2015-08-16 10:38                       ` Karl Dahlke
2015-08-14  3:37               ` Karl Dahlke
2015-08-16 18:10                 ` Adam Thompson
  -- strict thread matches above, loose matches on Subject: below --
2015-02-03 22:15 Karl Dahlke
2015-02-03 23:41 ` Adam Thompson
2015-02-02 19:58 Karl Dahlke
2015-02-03 21:18 ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).