From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: None (mailfrom) identity=mailfrom; client-ip=8.23.224.62; helo=out.smtp-auth.no-ip.com; envelope-from=kevin@carhart.net; receiver= Received: from out.smtp-auth.no-ip.com (smtp-auth.no-ip.com [8.23.224.62]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 83AD377DE5 for ; Sat, 19 Aug 2017 15:53:44 -0700 (PDT) X-No-IP: carhart.net@noip-smtp X-Report-Spam-To: abuse@no-ip.com Received: from carhart.net (unknown [99.52.200.227]) (Authenticated sender: carhart.net@noip-smtp) by smtp-auth.no-ip.com (Postfix) with ESMTPA id AD8972D4; Sat, 19 Aug 2017 15:54:00 -0700 (PDT) Received: from carhart.net (localhost [127.0.0.1]) by carhart.net (8.13.8/8.13.8) with ESMTP id v7JMrxU0020076; Sat, 19 Aug 2017 15:53:59 -0700 Received: from localhost (kevin@localhost) by carhart.net (8.13.8/8.13.8/Submit) with ESMTP id v7JMrwWu020070; Sat, 19 Aug 2017 15:53:59 -0700 Date: Sat, 19 Aug 2017 15:53:58 -0700 (PDT) From: Kevin Carhart To: Karl Dahlke cc: Edbrowse-dev@lists.the-brannons.com In-Reply-To: <20170719113834.eklhad@comcast.net> Message-ID: References: <20170719113834.eklhad@comcast.net> User-Agent: Alpine 2.03 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Re: [Edbrowse-dev] acid[0] X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.24 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Aug 2017 22:53:44 -0000 I think we're getting into CSS here. The acid3 html file has a text/css section at the top including this: #instructions:last-child { white-space: pre-wrap; white-space: x-bogus; } What are your feelings about css? I have been making a claim that I think there's some evidence for, but I'm not positive: Even though the bulk of CSS is not useful or interesting to the edbrowse renderer, we might still be interested in CSS because sites use the presence of CSS names and values as a workaround for user-agent spoofing. The collection of results from poking and prodding 100 attributes is what they take to be your browser and OS fingerprint, overriding what you said it was. Diabolical, huh? Do you think this is a compelling reason to get into CSS? I think I have found some 3rd-party JS code that we might be interested in, if we wanted to do something with this. It might save work. There's one object that is a CSS parser. It would turn a .css file into JSON, where it is easier to traverse afterwards. There is also a JS implementation of querySelectorAll, which works like getElementsByTagName, only the discernment of the result elements is based on selector syntax, rather than tag or name. The colon, the period, the hash mark have particular hardcoded meanings for different types of selections. thanks Kevin On Sat, 19 Aug 2017, Karl Dahlke wrote: > With Kevin pointing the way, I started looking at the first of 100 acid tests. > It runs into a problem in that it expects a pure whitespace node that is not there. > Note the following html. > > >

paragraph 1

paragraph 2

> > > Browse with db5 and tidy gives us the two paragraph nodes in sequence, there is no node in between with the newline (whitespace) character. > The javascript expects it to be there. > Why is it not there? > > Note html-tidy.c line 126. > I tell tidy not to drop empty elements, or empty paragraphs. > Geoff, or anyone else, any insights? > > Karl Dahlke > -------- Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists