edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
From: Kevin Carhart <kevin@carhart.net>
To: Karl Dahlke <eklhad@comcast.net>
Cc: Edbrowse-dev@lists.the-brannons.com
Subject: Re: [Edbrowse-dev] acid[0]
Date: Sat, 19 Aug 2017 15:53:58 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LRH.2.03.1708191537320.6887@carhart.net> (raw)
In-Reply-To: <20170719113834.eklhad@comcast.net>



I think we're getting into CSS here.  The acid3 html file has a text/css 
section at the top including this:
   #instructions:last-child { white-space: pre-wrap; white-space: x-bogus; 
}

What are your feelings about css?  I have been making a claim that I think 
there's some evidence for, but I'm not positive:  Even though the 
bulk of CSS is not useful or interesting to the edbrowse renderer, we 
might still be interested in CSS because sites use the presence of 
CSS names and values as a workaround for user-agent spoofing.  The 
collection of results from poking and prodding 100 attributes is what they 
take to be your browser and OS fingerprint, overriding what you said it 
was.  Diabolical, huh?

Do you think this is a compelling reason to get into CSS? I think I have 
found some 3rd-party JS code that we might be interested in, if we wanted 
to do something with this.  It might save work. There's one object that is 
a CSS parser.  It would turn a .css file into JSON, where it is easier to 
traverse afterwards.  There is also a JS implementation of 
querySelectorAll, which works like getElementsByTagName, only the 
discernment of the result elements is based on selector syntax, rather 
than tag or name.  The colon, the period, the hash mark have particular 
hardcoded meanings for different types of selections.

thanks
Kevin




On Sat, 19 Aug 2017, Karl Dahlke wrote:

> With Kevin pointing the way, I started looking at the first of 100 acid tests.
> It runs into a problem in that it expects a pure whitespace node that is not there.
> Note the following html.
>
> <body>
> <p>paragraph 1</p>
> <p>paragraph 2</p>
> </body>
>
> Browse with db5 and tidy gives us the two paragraph nodes in sequence, there is no node in between with the newline (whitespace) character.
> The javascript expects it to be there.
> Why is it not there?
>
> Note html-tidy.c line 126.
> I tell tidy not to drop empty elements, or empty paragraphs.
> Geoff, or anyone else, any insights?
>
> Karl Dahlke
>

--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists

  reply	other threads:[~2017-08-19 22:53 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-19 15:38 Karl Dahlke
2017-08-19 22:53 ` Kevin Carhart [this message]
2017-08-19 23:08   ` Karl Dahlke
2017-08-19 23:33     ` Kevin Carhart
2017-08-20  0:00       ` Karl Dahlke
2017-08-20  0:37         ` Kevin Carhart
2017-08-20 14:33           ` Karl Dahlke
2017-08-20 20:00             ` Kevin Carhart
2017-08-20 20:08               ` [Edbrowse-dev] getAttributeNode / setAttributeNode Kevin Carhart
2017-08-20 20:24                 ` Karl Dahlke
2017-08-20 20:56                   ` Kevin Carhart
2017-08-20 21:59                     ` Kevin Carhart
     [not found]                       ` <20170721105041.eklhad@comcast.net>
2017-08-21 19:11                         ` Kevin Carhart
2017-08-21 20:01                           ` Karl Dahlke
2017-08-24  9:54                             ` Kevin Carhart
2017-08-24  9:57                             ` Kevin Carhart
2017-08-25  8:19                             ` Kevin Carhart
2017-08-25 22:09                               ` [Edbrowse-dev] whitespace nodes Kevin Carhart
2017-08-25 22:56                                 ` Karl Dahlke
2017-08-26  4:25                                   ` [Edbrowse-dev] (something other than) " Kevin Carhart
2017-09-02  9:03                                     ` Adam Thompson
2017-09-02 15:42                                       ` Karl Dahlke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.03.1708191537320.6887@carhart.net \
    --to=kevin@carhart.net \
    --cc=Edbrowse-dev@lists.the-brannons.com \
    --cc=eklhad@comcast.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).