From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out.smtp-auth.no-ip.com (smtp-auth.no-ip.com [8.23.224.61]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 067B377ADA for ; Sat, 15 Aug 2015 22:51:12 -0700 (PDT) X-No-IP: carhart.net@noip-smtp X-Report-Spam-To: abuse@no-ip.com Received: from carhart.net (unknown [99.52.200.227]) (Authenticated sender: carhart.net@noip-smtp) by smtp-auth.no-ip.com (Postfix) with ESMTPA id 55ADE4008D0; Sat, 15 Aug 2015 22:54:50 -0700 (PDT) Received: from carhart.net (localhost [127.0.0.1]) by carhart.net (8.13.8/8.13.8) with ESMTP id t7G5sn1Q010033; Sat, 15 Aug 2015 22:54:49 -0700 Received: from localhost (kevin@localhost) by carhart.net (8.13.8/8.13.8/Submit) with ESMTP id t7G5snAi010030; Sat, 15 Aug 2015 22:54:49 -0700 Date: Sat, 15 Aug 2015 22:54:49 -0700 (PDT) From: Kevin Carhart To: Chris Brannon In-Reply-To: <87mvxt62we.fsf@mushroom.localdomain> Message-ID: References: <20150713234537.eklhad@comcast.net> <87mvxt62we.fsf@mushroom.localdomain> User-Agent: Alpine 2.03 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Edbrowse-dev@lists.the-brannons.com Subject: Re: [Edbrowse-dev] tidy5 X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Aug 2015 05:51:12 -0000 Chris said: > Essentially true. The details are a bit more complicated, but this is > the idea. We call tidy to parse the html, and we get back a structure > from tidy called a document. It contains our tree of nodes, and we > can iterate over it. The problem is, this is a usable parse tree for > the html, but it isn't a true DOM. We can remove nodes and attributes > from the tree, but we can't add them. That causes problems for JS that > needs to add new nodes. So we're going to have to take that parse tree > we get back from Tidy5, build our own DOM out of it, and eventually > render it. So the switch statement over (action) goes away, and the painstaking character-by-character tag recognition goes away, but maybe in return we need a switch statement with handling for every value, maybe grouped together in cases, that they list as "Known HTML element types" in the tidyenum.h file? And would some or most of the old case blocks be preserved, such as: the old case TAGACT_TABLE might resemble a new case TidyTag_TABLE the old case TAGACT_TR might resemble a new case TidyTag_TR ... Like you get some work done by the library, but also want a crack at these node types differentiated by what they are. Is that correct? We're still building the new string 'ns'.. hmmm... is more standardization possible, or do you still have to do a variety of things in order to add to ns properly? --- Here is a second note on what Karl said (paraphrasing), as a first step, how about bringing libtidy into html.c, run their parse method and just bring the output around as part of the ebWindow struct, for further examination without breaking what now exists. My note on this. I went to eb.h to see what would happen if I included tidy.h and added a TidyDoc to the ebWindow struct. Interestingly, because of includes from includes, there is a name collision when I try to compile.. I think... over mkdir in plugin.c and mkdir in /usr/include/sys/stat.h. Uh, maybe it's a client thing though. Disregard if it doesn't sound salient.. thanks.. this is fun.. I hope tidy will work Kevin -------- Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists