edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] acid[0]
@ 2017-08-19 15:38 Karl Dahlke
  2017-08-19 22:53 ` Kevin Carhart
  0 siblings, 1 reply; 22+ messages in thread
From: Karl Dahlke @ 2017-08-19 15:38 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

With Kevin pointing the way, I started looking at the first of 100 acid tests.
It runs into a problem in that it expects a pure whitespace node that is not there.
Note the following html.

<body>
<p>paragraph 1</p>
<p>paragraph 2</p>
</body>

Browse with db5 and tidy gives us the two paragraph nodes in sequence, there is no node in between with the newline (whitespace) character.
The javascript expects it to be there.
Why is it not there?

Note html-tidy.c line 126.
I tell tidy not to drop empty elements, or empty paragraphs.
Geoff, or anyone else, any insights?

Karl Dahlke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] acid[0]
  2017-08-19 15:38 [Edbrowse-dev] acid[0] Karl Dahlke
@ 2017-08-19 22:53 ` Kevin Carhart
  2017-08-19 23:08   ` Karl Dahlke
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-19 22:53 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev



I think we're getting into CSS here.  The acid3 html file has a text/css 
section at the top including this:
   #instructions:last-child { white-space: pre-wrap; white-space: x-bogus; 
}

What are your feelings about css?  I have been making a claim that I think 
there's some evidence for, but I'm not positive:  Even though the 
bulk of CSS is not useful or interesting to the edbrowse renderer, we 
might still be interested in CSS because sites use the presence of 
CSS names and values as a workaround for user-agent spoofing.  The 
collection of results from poking and prodding 100 attributes is what they 
take to be your browser and OS fingerprint, overriding what you said it 
was.  Diabolical, huh?

Do you think this is a compelling reason to get into CSS? I think I have 
found some 3rd-party JS code that we might be interested in, if we wanted 
to do something with this.  It might save work. There's one object that is 
a CSS parser.  It would turn a .css file into JSON, where it is easier to 
traverse afterwards.  There is also a JS implementation of 
querySelectorAll, which works like getElementsByTagName, only the 
discernment of the result elements is based on selector syntax, rather 
than tag or name.  The colon, the period, the hash mark have particular 
hardcoded meanings for different types of selections.

thanks
Kevin




On Sat, 19 Aug 2017, Karl Dahlke wrote:

> With Kevin pointing the way, I started looking at the first of 100 acid tests.
> It runs into a problem in that it expects a pure whitespace node that is not there.
> Note the following html.
>
> <body>
> <p>paragraph 1</p>
> <p>paragraph 2</p>
> </body>
>
> Browse with db5 and tidy gives us the two paragraph nodes in sequence, there is no node in between with the newline (whitespace) character.
> The javascript expects it to be there.
> Why is it not there?
>
> Note html-tidy.c line 126.
> I tell tidy not to drop empty elements, or empty paragraphs.
> Geoff, or anyone else, any insights?
>
> Karl Dahlke
>

--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] acid[0]
  2017-08-19 22:53 ` Kevin Carhart
@ 2017-08-19 23:08   ` Karl Dahlke
  2017-08-19 23:33     ` Kevin Carhart
  0 siblings, 1 reply; 22+ messages in thread
From: Karl Dahlke @ 2017-08-19 23:08 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 324 bytes --]

Well duktape has some json support out of the box.
It has a JSON global object, with JSON.parse() in it and I don't know what else, so using js to convert css to json might be a practical pathway.
Course we'd have to follow up with a function to apply bgcolor=white to foo.style wherever that makes sense.

Karl Dahlke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] acid[0]
  2017-08-19 23:08   ` Karl Dahlke
@ 2017-08-19 23:33     ` Kevin Carhart
  2017-08-20  0:00       ` Karl Dahlke
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-19 23:33 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Sat, 19 Aug 2017, Karl Dahlke wrote:

> Well duktape has some json support out of the box.
> It has a JSON global object, with JSON.parse() in it and I don't know what else, so using js to convert css to json might be a practical pathway.

Correction:  Actually I made a mistakes that it was JSON.  The 
parser doesn't return JSON, the parser returns a tree of nested objects. 
The documentation simply turned it into JSON in order to serialize the 
contents of the object for readability.  So this is even more familiar, 
just traversal with recursion maybe.

> Course we'd have to follow up with a function to apply bgcolor=white to 
> foo.style wherever that makes sense.

I think that's right.  We might get the first two thirds of a three step
process done "free" by the libraries and have to write the function that
you describe.  Since querySelectorAll returns elements (foo) and the
parser css.js breaks down selectors, attribute names (bgcolor) and
attribute values (white) into neat compartments, I think it would be
(don't want to speak too soon) somewhat straightforward to dole out
bgcolor=white to foo.style.

Here is the code for the parser and then for querySelector:

https://raw.githubusercontent.com/jotform/css.js/master/css.js
https://raw.githubusercontent.com/yiminghe/query-selector/master/build/query-selector-debug.js

And here are the git projects:

https://github.com/jotform/css.js.git
https://github.com/yiminghe/query-selector.git

K

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] acid[0]
  2017-08-19 23:33     ` Kevin Carhart
@ 2017-08-20  0:00       ` Karl Dahlke
  2017-08-20  0:37         ` Kevin Carhart
  0 siblings, 1 reply; 22+ messages in thread
From: Karl Dahlke @ 2017-08-20  0:00 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 535 bytes --]

Not sure what querySelectorAll is all about; can't we just call document.getElementsByTagName()?
So if an object says p.snork has bgcolor=white
then we get the array
a = document.getElementsByTagName("p");
Loop over array and if obj.class == "snork" then obj.style.bgcolor = white.
Or if the descriptor is on #instructions rather than a class of nodes, we use getElementById to find the node and then set its values.
So I think we already have the middle third, and the last third seems reasonably easy to write.

Karl Dahlke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] acid[0]
  2017-08-20  0:00       ` Karl Dahlke
@ 2017-08-20  0:37         ` Kevin Carhart
  2017-08-20 14:33           ` Karl Dahlke
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-20  0:37 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev



On Sat, 19 Aug 2017, Karl Dahlke wrote:

> Not sure what querySelectorAll is all about; can't we just call document.getElementsByTagName()?

It's a thing of its own.  A lot of sites' JS uses this.  For instance, in 
the nasa.gov code file vendor.js,

e.querySelectorAll("[msallowcapture^='']")
e.querySelectorAll("[selected]")
e.querySelectorAll(":checked")
a=r.querySelector("#morph-"+n)
e.querySelectorAll("[id~="+q+"-]")
e.querySelectorAll("a#"+q+"+*")

The brackets, the hash and the colon have hardcoded meanings.  And the 
syntax used here, I believe is the same selector syntax you find in CSS 
blocks.  So at the least, there's also the period and the at symbol:

   .hidden { visibility: hidden; }
   @font-face { font-family: "AcidAhemTest"; src: url(font.ttf); }


> So if an object says p.snork has bgcolor=white
> then we get the array
> a = document.getElementsByTagName("p");
> Loop over array and if obj.class == "snork" then obj.style.bgcolor = white.
> Or if the descriptor is on #instructions rather than a class of nodes, we use getElementById to find the node and then set its values.
> So I think we already have the middle third, and the last third seems reasonably easy to write.

I don't rule out that this can be done.  It depends if you want to dig in 
to the selectors language-within-a-language or use a component 
to hopefully avoid having to.  If it's fun, that's good.  If it's 
completely undesirable to learn a new mini syntax, maybe the outside 
component can do it for us.

I think even if you wanted to do a certain 
thing within the implementation that called getElements, there would need 
to be a wrapper called querySel etc which is going to receive an argument 
beginning with a symbol.  We can any kind of node math we want under the 
hood in order to select the results.

It's definitely possible that you will know how to do it, so that it 
would turn out to be less work than bringing in the outside code.  I 
don't know which of those is less work.

Kevin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] acid[0]
  2017-08-20  0:37         ` Kevin Carhart
@ 2017-08-20 14:33           ` Karl Dahlke
  2017-08-20 20:00             ` Kevin Carhart
  0 siblings, 1 reply; 22+ messages in thread
From: Karl Dahlke @ 2017-08-20 14:33 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2611 bytes --]

Ok Kevin, all that free software is just too irresistable!
I put both functions in startwindow.js and pushed, so everybody have a look.

The css parser works like a dream!
You can even do it stand alone.

duk -i startwindow.js

parser = new cssjs;
list = parser.parseCSS(css_string);

list is an array of the css descriptors in the string.
Each is an object with members: selector, rules, comment.
The selector is something like "p.snork".
Rules is another array of all the keyword value pairs like  bgcolor = white
It's so simple and clean.
I took the ridiculous <style> tag out of acid3 and pushed it through the parser, and it worked perfectly.
45 descriptors corresponding to the contents of that <style> tag.

querySelectorAll is not as simple.
As Kevin pointed out, other websites use the same construct, maybe even the same code, and I don't want their function to collide with our function, especially if they work somewhat differently, so I call ours eb$qs.

eb$qs("div")

But there's another problem.
The code creates the function querySelectorAll, or in our case eb$qs, and in doing so it creates a temporary <div> tag.
That doesn't work unless we have a framework in place.
So I put a wrapper around it:  eb$qs$start().
That sets everything up to then run eb$qs as often as you like.
Eventually edbrowse will call eb$qs$start() after the html document is browsed and before the first javascript runs.

eb$qs$start()
parser.parseCSS() on every style tag in the document and every file <link type=css href=>
Then map those values onto the objects by applying eb$qs to each selector in each css descriptor.

But there are more problems.
Try it with jsrt.
Set db3 so you can see what is going on.
browse, jdb, eb$qs$start()
and now you're ready to go.
list = eb$qs("script");
Holy crap it works, list is an array of 9 objects for the 9 scripts in jsrt.
Step through each one and look at list[i].data.
That is the contents of each script.
This also works for "p" "a", and other such things.
It doesn't work for "table.filbert", even though we have a <table class=filbert> tag.
It calls a method getAttributeNode which we don't have. Oops.
That's probably our omission, and something we should address anyways.

Then try eb$qs("#jkl");
That doesn't work either.
We are missing the compareDocumentPosition() method.

So we can't move forward on this until we fill in some missing pieces in our DOM.
Any volunteers to implement getAttributeNode() or compareDocumentPosition()?
The former is easier, and more important, than the latter.

Karl Dahlke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] acid[0]
  2017-08-20 14:33           ` Karl Dahlke
@ 2017-08-20 20:00             ` Kevin Carhart
  2017-08-20 20:08               ` [Edbrowse-dev] getAttributeNode / setAttributeNode Kevin Carhart
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-20 20:00 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev



Wow!  Thank you for doing all of this!

> It's so simple and clean.

Yes, the parser in particular seems to be very high quality.  The 
developers have notes that call it "battle tested".

> querySelectorAll is not as simple. As Kevin pointed out, other websites 
> use the same construct, maybe even the same code, and I don't want their 
> function to collide with our function, especially if they work somewhat 
> differently, so I call ours eb$qs.

I'd like to clarify something here.  When nasa.gov calls querySelectorAll, 
it is on the same order as appendChild as far as that web developer is 
concerned.  They expect it to be provided by the browser, which for all 
they know is a compiled, closed-source browser.  Isn't collision not 
exactly right for the situation?  Other websites use the same construct 
but only to call and expect it to be provided.

There's nothing wrong with calling ours eb$qs, but are we then going to 
create a wrapper so that page code can lock on to it by name?

- There's one more thing to mention that might be relevant.  It's 
wonderful that you dove in!  We might need to calibrate the querySelector 
code for browsers rather than node, which is the system for server-side 
javascript (I think it's like an interpreter - I may be describing it 
wrong.  I have used it, but not that much.)  If there are references to 
"exports", I think these need to be removed.  I have definitely gotten qS 
working with edbrowse in the past!  But I have not gone through the 
motions recently.  Maybe you are way ahead of me if you got it working.

> It calls a method getAttributeNode which we don't have. Oops.

It's entirely possible that I implemented this in the same experimental 
build where I got qS working and have never turned it in.  I will check.


Kevin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] getAttributeNode / setAttributeNode
  2017-08-20 20:00             ` Kevin Carhart
@ 2017-08-20 20:08               ` Kevin Carhart
  2017-08-20 20:24                 ` Karl Dahlke
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-20 20:08 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev



Didn't we have this at one time?  Maybe not, I don't remember.  It is 
basically a scalar-to-object converter.  Given a string, it wants the 
balloon blown up.  It wants an attribute node whose name is the passed in 
string.

         document.getAttributeNode = function (name)
         {
         rv = document.createElement("Attr");
         rv.setAttribute(name,this[name.toLowerCase()]);
         return rv;
         }
         document.setAttributeNode = function(name, v) {
         this.attributes[name.toLowerCase()] = v;
         this[name.toLowerCase()] = v;
         }



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] getAttributeNode / setAttributeNode
  2017-08-20 20:08               ` [Edbrowse-dev] getAttributeNode / setAttributeNode Kevin Carhart
@ 2017-08-20 20:24                 ` Karl Dahlke
  2017-08-20 20:56                   ` Kevin Carhart
  0 siblings, 1 reply; 22+ messages in thread
From: Karl Dahlke @ 2017-08-20 20:24 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]

Well as you see, I implemented getAttributeNode(), because it wasn't hard, but a little harder than your example suggests because of side effects.
Setting value has to propagate down to setattribute in the original element, which I do with a setter.
With this in place, much of eb$qs is working.

Sounds like I misunderstood though, and it should really be called querySelectorAll, but that's just a one line change if we want to do that.
Let me know if that's what we should do.

I notice inside the code it checks navigator.userAgent, so it tailors itself to the kind of browser we are.
God knows what it does with edbrowse.   :)
Anyways, to make this all work standalone, without edbrowse,  duk -i startwindow.js, I had to put in something for navigator.userAgent, or it was blowing up.
Line 195.

Karl Dahlke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] getAttributeNode / setAttributeNode
  2017-08-20 20:24                 ` Karl Dahlke
@ 2017-08-20 20:56                   ` Kevin Carhart
  2017-08-20 21:59                     ` Kevin Carhart
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-20 20:56 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev

On Sun, 20 Aug 2017, Karl Dahlke wrote:

> Well as you see, I implemented getAttributeNode(), because it wasn't hard, but a little harder than your example suggests because of side effects.

Ah!  Thank you.
>
> Sounds like I misunderstood though, and it should really be called querySelectorAll, but that's just a one line change if we want to do that.
> Let me know if that's what we should do.

Well.. I believe so, in the same way that we have apch, but pages by 
some random web developer in the world expect to lock on to appendChild. 
It's the DOM.  querySelector and querySelectorAll are part of the DOM as 
far as they are concerned.  We just happen to be implementing them in open 
javascript.

> I notice inside the code it checks navigator.userAgent, so it tailors itself to the kind of browser we are.

Yes.. I remember having a problem with a couple of lines that I think test 
for an IE version.

I remember that the qS code has some multi byte Asian letters in some 
comments.  I'll track them down later.  Maybe they will sit merrily and be 
ignored, but I'm worried that they would make startwindow garbled if 
someone was compiling from source and didn't have a charset that renders 
these alphabets.  Maybe it's fine.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] getAttributeNode / setAttributeNode
  2017-08-20 20:56                   ` Kevin Carhart
@ 2017-08-20 21:59                     ` Kevin Carhart
       [not found]                       ` <20170721105041.eklhad@comcast.net>
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-20 21:59 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev



I don't think any of what you are doing is incorrect though.  There are 
multiple sub-projects going on at once almost!  It is very, very, very 
wonderful to parse and propagate style blocks and .css files.  It is a 
quantum leap, iframe support is a quantum leap, so I am in heaven.

So if we ourselves call qS almost like a private member that would not be 
exposed to pages, this is great.  We can call it anything we want.

There are two potential use cases for qS.

The same work that can do two kinds of jobs.  The first job is, we call qS 
ourselves, as part of I think a three step process to (a) parse css 
sections (b) identify and return the set of elements that the styles need 
to be doled out to (c) dole out the styles to that set of elements

I leapt ahead, without enough explanation of what I was on about. 
Because pages *also* call querySelector and 
querySelectorAll.  It's separate in some ways - it is separate from us 
being a web browser and doing an integral, fundamental thing with styles 
information.  It is more like the toolbelt of the web designer.  The web 
designer calls getElementsByTagName("blah") in one function and then calls 
querySelectorAll(":blah") or querySelectorAll(".blah") in the next.

One job is low level and internal, and the other job is high level and 
external, but they both use qS to process the selectors mini-language and 
then search the tree.  (Those terms "high level" and "low level" are so 
overloaded both in technical settings and regular society or whatnot that 
they are really useless.  But I hope you get what I'm describing.  If 
low-level is taken to mean, less about aesthetics, fundamental 
architecture of a web browser per se, that's the first job.  If high-level 
is taken to mean, scripters and designers who build websites, that's the 
second job.)

Does that make sense?  Sorry if by overlapping two use cases I made 
anything confusing.  It's like water rushing downstream because I am so 
excited about both of the scenarios!!




>> Sounds like I misunderstood though, and it should really be called 
>> querySelectorAll, but that's just a one line change if we want to do that.
>> Let me know if that's what we should do.
>
> Well.. I believe so, in the same way that we have apch, but pages by some 
> random web developer in the world expect to lock on to appendChild. It's the 
> DOM.  querySelector and querySelectorAll are part of the DOM as far as they 
> are concerned.  We just happen to be implementing them in open javascript.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] getAttributeNode / setAttributeNode
       [not found]                       ` <20170721105041.eklhad@comcast.net>
@ 2017-08-21 19:11                         ` Kevin Carhart
  2017-08-21 20:01                           ` Karl Dahlke
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-21 19:11 UTC (permalink / raw)
  To: Edbrowse-dev

On Mon, 21 Aug 2017, Karl Dahlke wrote:

> So css attributes from <style> tags or from <link> css files now apply to the objects as they should. It's cool.
> See tests 164 and 165 in jsrt.

This is great!  I tried it out last night a bit.

>
> Still acid 0 is a long way away.

Yes, I went to acid 0.  I think we are very close.  'last' and 
'penultimate' did not used to have anything in them correctly, and now 
they do.  The assertion at the end that uses computedStyle would even work 
if the property being retrieved happened to be one of the ones propagated 
by our CSS code.  qS("#instructions:last-child") returns zero elements. 
It isn't picking up penultimate.  But all of the earlier steps prior to 
the last line are working.

> One of the mysteries remaining is they set "white-space" = "pre-wrap" in the style block, but then the test checks for .whiteSpace.
> Now how when or why does white-space equate to whiteSpace? I don't get that.

Aha!  I found something out about this.  There is this DOM 
implementation by Thatcher et al, called env.js.  I used it a couple of 
years ago with an edbrowse 3.3.1 before we started ours.  I learned a lot 
from using it.  They have CSS-related code, and they have the following 
internal routines:

var __toCamelCase__ = function(name) {
     if (name) {
         return name.replace(/\-(\w)/g, function(all, letter) {
             return letter.toUpperCase();
         });
     }
     return name;
};

var __toDashed__ = function(camelCaseName) {
     if (camelCaseName) {
         return camelCaseName.replace(/[A-Z]/g, function(all) {
             return '-' + all.toLowerCase();
         });
     }
     return camelCaseName;
};


So I conclude that formalized conversion of camel case to/from dashed CSS 
is a thing.  I think that may be the missing link or one of them.

Kevin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] getAttributeNode / setAttributeNode
  2017-08-21 19:11                         ` Kevin Carhart
@ 2017-08-21 20:01                           ` Karl Dahlke
  2017-08-24  9:54                             ` Kevin Carhart
                                               ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Karl Dahlke @ 2017-08-21 20:01 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 2024 bytes --]

Ok I now convert foo-bar to fooBar, as you suggest, and as the acid test 0 suggests, but I think it's wweird.

You say you're not finding the right properties in penultimate, but oh boy it's very subtle.
There are several problems at play.
The selector we're looking for is #instructions:last-child, and I had to read some of the MIT code to see what that meant.
It means the node with id=instructions, and it has to be the last nontrivial child of its parent, where nontrivial beans nodeType = 1.
So a silly empty whitespace node doesn't count.
At the time the acid test runs, and at the time it calls getComputedStyle() to make its calculation,
it has already removed the paragraph after instructions, and the instructional paragraph is indeed the last child of its parent.
So getComputedStyle creates a style object for this node, and it should have whiteSpace set properly, but it's just a dynamically created style node, it's not the actual style attached to the node.
That style we might be messing with, might change it to green etc.
getcomputedStyle simply tells you what the style would be, right now, if all the rules were applied.
So I'm starting to unravel that but there's another problem.
After this test runs, and succeeds or fails, another script runs and does a document.write which adds all sorts of nodes to body.
So now the browse is done, and you get into jdb, and you try to reproduce this stuff, but you can't,
because instructions isn't the last child of its parent any more.
It was but it isn't any more, so the machinery looks like it's not working but it works just fine.

So - I think we are just one step away from test 0 passing.
The test expects a blank node between the two paragraphs, a node corrresponding to the newline character, an empty node, a node of nodeType 0,
but tidy doesn't give us this node, so nothing lines up.
I asked Geoff about this and am waiting for his reply.
If tidy doesn't give us those nodes, then acid test 0 will never pass.

Karl Dahlke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] getAttributeNode / setAttributeNode
  2017-08-21 20:01                           ` Karl Dahlke
@ 2017-08-24  9:54                             ` Kevin Carhart
  2017-08-24  9:57                             ` Kevin Carhart
  2017-08-25  8:19                             ` Kevin Carhart
  2 siblings, 0 replies; 22+ messages in thread
From: Kevin Carhart @ 2017-08-24  9:54 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev



I was visiting my parents and afk, but I am back now and excited by the 
latest.

On Mon, 21 Aug 2017, Karl Dahlke wrote:

> Ok I now convert foo-bar to fooBar, as you suggest, and as the acid test 0 suggests, but I think it's wweird.
>
> You say you're not finding the right properties in penultimate, but oh boy it's very subtle.

Yeah.

> There are several problems at play.
> The selector we're looking for is #instructions:last-child, and I had to read some of the MIT code to see what that meant.

Interesting.  What is the MIT code?  Is it like a CSS spec?


> It was but it isn't any more, so the machinery looks like it's not working but it works just fine.
>
> So - I think we are just one step away from test 0 passing.

Exactly!!  I am glad you went there because now we both have our bearings 
in the same stuff.  I completely agree about things getting clobbered 
later, creating the suggestion at jdb-time that it isn't working.  I 
labored under this misapprehension for a while and wasted time before 
figuring this out.  So as a workaround, I said wget 
http://acid3.acidtests.org, save it locally 
as index.html or another name, and then add alerts in the "test 0" code so 
that you can get your feedback from when it actually runs and not from 
jdb, later.


> but tidy doesn't give us this node, so nothing lines up.

Ah, is that right!  So this is where we came in.  You mentioned this a 
couple days ago and that was when I brought up the CSS components.  So now 
we are really getting down to the problem.  Amazing how much they pack 
into test zero.

Woo!  I am literally doing a little dance every day about new edbrowse.

In honor of the fact that we are working on these Stylistic issues, the 
official soundtrack of edbrowse-dev, for a while at least, will be 
"Betcha By Golly Wow" by The Stylistics.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] getAttributeNode / setAttributeNode
  2017-08-21 20:01                           ` Karl Dahlke
  2017-08-24  9:54                             ` Kevin Carhart
@ 2017-08-24  9:57                             ` Kevin Carhart
  2017-08-25  8:19                             ` Kevin Carhart
  2 siblings, 0 replies; 22+ messages in thread
From: Kevin Carhart @ 2017-08-24  9:57 UTC (permalink / raw)
  To: Karl Dahlke; +Cc: Edbrowse-dev



> The selector we're looking for is #instructions:last-child, and I had to read some of the MIT code to see what that meant.

Oops, you're talking about the MIT-licensed code from Jotform and 
yiminghe, right?  I get it now.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] getAttributeNode / setAttributeNode
  2017-08-21 20:01                           ` Karl Dahlke
  2017-08-24  9:54                             ` Kevin Carhart
  2017-08-24  9:57                             ` Kevin Carhart
@ 2017-08-25  8:19                             ` Kevin Carhart
  2017-08-25 22:09                               ` [Edbrowse-dev] whitespace nodes Kevin Carhart
  2 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-25  8:19 UTC (permalink / raw)
  To: Edbrowse-dev



Thank you for the writeup of the routines in qS (my abbreviation for the 
third party querySelector code) that are called for 
#instructions:last-child!  This writeup is very helpful.

> The test expects a blank node between the two paragraphs, a node corrresponding to the newline character, an empty node, a node of nodeType 0,
> but tidy doesn't give us this node, so nothing lines up.
> I asked Geoff about this and am waiting for his reply.

So does this mean that all pages should have tons of these nodes all over 
the place?

I guess we will know soon enough when certain persons are available.  :=))
Or we could start a thread about this under Issues.

But I am trying to play along in case I can make some headway now.  Maybe 
it is comparable to the options we already set in html-tidy.c:
         tidyOptSetBool(tdoc, TidyEscapeScripts, no);
         tidyOptSetBool(tdoc, TidyDropEmptyElems, no);
         tidyOptSetBool(tdoc, TidyDropEmptyParas, no);


My candidates so far are TidyNewline and TidyEmptyTags.  I don't know what 
they do yet - those are just the ones with plausible names.

For anyone reading who doesn't already know, there is a long list of 
tidy config options under tidy-html5-master/src, FYI.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] whitespace nodes
  2017-08-25  8:19                             ` Kevin Carhart
@ 2017-08-25 22:09                               ` Kevin Carhart
  2017-08-25 22:56                                 ` Karl Dahlke
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-25 22:09 UTC (permalink / raw)
  To: Edbrowse-dev



I haven't been able to get additional nodes created out of newlines just 
by adding a certain tidyOptSet. 
I tried one called TidyLiteralAttribs, but this is for passing through the 
contents of tags (I think), and the whitespace we want is in between tags 
like "negative space", so to speak.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] whitespace nodes
  2017-08-25 22:09                               ` [Edbrowse-dev] whitespace nodes Kevin Carhart
@ 2017-08-25 22:56                                 ` Karl Dahlke
  2017-08-26  4:25                                   ` [Edbrowse-dev] (something other than) " Kevin Carhart
  0 siblings, 1 reply; 22+ messages in thread
From: Karl Dahlke @ 2017-08-25 22:56 UTC (permalink / raw)
  To: Edbrowse-dev

> I haven't been able to get additional nodes created out of newlines

Not sure how hard we should work on this, or even if we want it, just to pass an acid test.
It probably has no bearing in the real world, and who wants all those empty nodes cluttering up the tree?
For now I think we should just delete or comment out line 227 in the acid test file,
it's just understood that this line is nulled out, then test 0 should pass and we move on.
Let's get the value out of the acid tests without becoming obsessed over them.
That's my gut feeling right now.

Karl Dahlke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] (something other than) whitespace nodes
  2017-08-25 22:56                                 ` Karl Dahlke
@ 2017-08-26  4:25                                   ` Kevin Carhart
  2017-09-02  9:03                                     ` Adam Thompson
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Carhart @ 2017-08-26  4:25 UTC (permalink / raw)
  To: Edbrowse-dev



Thanks for pointing this out.  I guess I took something overly literal 
that is not a part of the generic principle they're getting at in the 
test.  Clearly node-ifying every '\n' in every web page isn't common or 
important or we would have hit it previously.. I could have keyed in to 
this fact sooner.  Oh well.  I was only in tidy for a short time, and the 
exploration seems useful anyhow.



On Fri, 25 Aug 2017, Karl Dahlke wrote:

>> I haven't been able to get additional nodes created out of newlines
>
> Not sure how hard we should work on this, or even if we want it, just to pass an acid test.
> It probably has no bearing in the real world, and who wants all those empty nodes cluttering up the tree?
> For now I think we should just delete or comment out line 227 in the acid test file,
> it's just understood that this line is nulled out, then test 0 should pass and we move on.
> Let's get the value out of the acid tests without becoming obsessed over them.
> That's my gut feeling right now.
>
> Karl Dahlke
> _______________________________________________
> Edbrowse-dev mailing list
> Edbrowse-dev@lists.the-brannons.com
> http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev
>

--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Edbrowse-dev] (something other than) whitespace nodes
  2017-08-26  4:25                                   ` [Edbrowse-dev] (something other than) " Kevin Carhart
@ 2017-09-02  9:03                                     ` Adam Thompson
  2017-09-02 15:42                                       ` Karl Dahlke
  0 siblings, 1 reply; 22+ messages in thread
From: Adam Thompson @ 2017-09-02  9:03 UTC (permalink / raw)
  To: Kevin Carhart; +Cc: Edbrowse-dev

First of all thanks for all the work you've all done on this and appologies for
going silent... again... Hopefully this time I'll keep my computers working at
least long enough to participate in discussions again.

On Fri, Aug 25, 2017 at 09:25:42PM -0700, Kevin Carhart wrote:
> 
> Thanks for pointing this out.  I guess I took something overly literal that
> is not a part of the generic principle they're getting at in the test.
> Clearly node-ifying every '\n' in every web page isn't common or important
> or we would have hit it previously.. I could have keyed in to this fact
> sooner.  Oh well.  I was only in tidy for a short time, and the exploration
> seems useful anyhow.

Tbh it sounds like it wasn't wasted time in that we now understand that this
could be a thing in the future (although it sounds like a strange thing which is
probably why tidy doesn't do it).  I'd also say that the more we know about the
tidy code the better so, as you say, the exploration was probably worth it.
Anyway I agree with safely ignoring the lack of a newline because... who cares
about blank text nodes (which is, I guess, what this would be).  May be we need
this in the future, but I can't imagine why.

Cheers,
Adam.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Edbrowse-dev] (something other than) whitespace nodes
  2017-09-02  9:03                                     ` Adam Thompson
@ 2017-09-02 15:42                                       ` Karl Dahlke
  0 siblings, 0 replies; 22+ messages in thread
From: Karl Dahlke @ 2017-09-02 15:42 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 951 bytes --]

Geoff has confirmed that tidy does not mess with intervening whitespace, and certainly doesn't turn it into empty nodes, and isn't likely too in the future.
After all, the html spec says such space is meaningless, so he's following the spec.
But then acid3 assumes every browser creates these empty whitespace nodes. It's a contradiction.
I'm not gonna worry about it.
Just know that to pass acid test 0 you have to delete or comment out line 227, and on we go.
I don't think this will ever be a problem in the real world.
In fact the last-child first-child css constructs are defined to be the last or first "real" nodes under a parent,
so if for some reason the browser cranks out empty whitespace nodes those are ignored.
They design it to work no matter how your browser behaves.
I read the jotform code and it screens for noteType = 1, real nodes. In other words, I think we're fine and we can move on to something else.

Karl Dahlke

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-09-02 15:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-19 15:38 [Edbrowse-dev] acid[0] Karl Dahlke
2017-08-19 22:53 ` Kevin Carhart
2017-08-19 23:08   ` Karl Dahlke
2017-08-19 23:33     ` Kevin Carhart
2017-08-20  0:00       ` Karl Dahlke
2017-08-20  0:37         ` Kevin Carhart
2017-08-20 14:33           ` Karl Dahlke
2017-08-20 20:00             ` Kevin Carhart
2017-08-20 20:08               ` [Edbrowse-dev] getAttributeNode / setAttributeNode Kevin Carhart
2017-08-20 20:24                 ` Karl Dahlke
2017-08-20 20:56                   ` Kevin Carhart
2017-08-20 21:59                     ` Kevin Carhart
     [not found]                       ` <20170721105041.eklhad@comcast.net>
2017-08-21 19:11                         ` Kevin Carhart
2017-08-21 20:01                           ` Karl Dahlke
2017-08-24  9:54                             ` Kevin Carhart
2017-08-24  9:57                             ` Kevin Carhart
2017-08-25  8:19                             ` Kevin Carhart
2017-08-25 22:09                               ` [Edbrowse-dev] whitespace nodes Kevin Carhart
2017-08-25 22:56                                 ` Karl Dahlke
2017-08-26  4:25                                   ` [Edbrowse-dev] (something other than) " Kevin Carhart
2017-09-02  9:03                                     ` Adam Thompson
2017-09-02 15:42                                       ` Karl Dahlke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).