From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kevin@carhart.net>
Received: from out.smtp-auth.no-ip.com (smtp-auth.no-ip.com [8.23.224.61])
 by hurricane.the-brannons.com (Postfix) with ESMTPS id 067B377ADA
 for <Edbrowse-dev@lists.the-brannons.com>;
 Sat, 15 Aug 2015 22:51:12 -0700 (PDT)
X-No-IP: carhart.net@noip-smtp
X-Report-Spam-To: abuse@no-ip.com
Received: from carhart.net (unknown [99.52.200.227])
 (Authenticated sender: carhart.net@noip-smtp)
 by smtp-auth.no-ip.com (Postfix) with ESMTPA id 55ADE4008D0;
 Sat, 15 Aug 2015 22:54:50 -0700 (PDT)
Received: from carhart.net (localhost [127.0.0.1])
 by carhart.net (8.13.8/8.13.8) with ESMTP id t7G5sn1Q010033;
 Sat, 15 Aug 2015 22:54:49 -0700
Received: from localhost (kevin@localhost)
 by carhart.net (8.13.8/8.13.8/Submit) with ESMTP id t7G5snAi010030;
 Sat, 15 Aug 2015 22:54:49 -0700
Date: Sat, 15 Aug 2015 22:54:49 -0700 (PDT)
From: Kevin Carhart <kevin@carhart.net>
To: Chris Brannon <chris@the-brannons.com>
In-Reply-To: <87mvxt62we.fsf@mushroom.localdomain>
Message-ID: <alpine.LRH.2.03.1508152212340.16516@carhart.net>
References: <alpine.LRH.2.03.1508100049370.5894@carhart.net>
 <alpine.LRH.2.03.1508131721300.18576@carhart.net>
 <20150713234537.eklhad@comcast.net> <87mvxt62we.fsf@mushroom.localdomain>
User-Agent: Alpine 2.03 (LRH 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Edbrowse-dev@lists.the-brannons.com
Subject: Re: [Edbrowse-dev] tidy5
X-BeenThere: edbrowse-dev@lists.the-brannons.com
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Edbrowse Development List <edbrowse-dev.lists.the-brannons.com>
List-Unsubscribe: <http://lists.the-brannons.com/mailman/options/edbrowse-dev>, 
 <mailto:edbrowse-dev-request@lists.the-brannons.com?subject=unsubscribe>
List-Archive: <http://lists.the-brannons.com/mailman/private/edbrowse-dev/>
List-Post: <mailto:edbrowse-dev@lists.the-brannons.com>
List-Help: <mailto:edbrowse-dev-request@lists.the-brannons.com?subject=help>
List-Subscribe: <http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev>, 
 <mailto:edbrowse-dev-request@lists.the-brannons.com?subject=subscribe>
X-List-Received-Date: Sun, 16 Aug 2015 05:51:12 -0000


Chris said:
> Essentially true.  The details are a bit more complicated, but this is
> the idea.  We call tidy to parse the html, and we get back a structure
> from tidy called a document.  It contains our tree of nodes, and we
> can iterate over it.  The problem is, this is a usable parse tree for
> the html, but it isn't a true DOM.  We can remove nodes and attributes
> from the tree, but we can't add them.  That causes problems for JS that
> needs to add new nodes.  So we're going to have to take that parse tree
> we get back from Tidy5, build our own DOM out of it, and eventually
> render it.

So the switch statement over (action) goes away, and the painstaking 
character-by-character tag recognition goes away, but maybe in return we need 
a switch statement with handling for every value, maybe grouped together 
in cases, that they list as "Known HTML element types" in the tidyenum.h 
file?

And would some or most of the old case blocks be preserved, such as:
the old case TAGACT_TABLE might resemble a new case TidyTag_TABLE
the old case TAGACT_TR might resemble a new case TidyTag_TR
...

Like you get some work done by the library, but also want a crack at these 
node types differentiated by what they are.  Is that correct?  We're still 
building the new string 'ns'.. hmmm... is more standardization possible, 
or do you still have to do a variety of things in order to add to ns 
properly?


---
Here is a second note on what Karl said (paraphrasing), as a first step, 
how about bringing libtidy into html.c, run their parse method and just 
bring the output around as part of the ebWindow struct, for further 
examination without breaking what now exists.

My note on this.  I went to eb.h to see what would happen if I included 
tidy.h and added a TidyDoc to the ebWindow struct.  Interestingly, because 
of includes from includes, there is a name collision when I try to 
compile.. I think... over mkdir in plugin.c and mkdir in 
/usr/include/sys/stat.h.  Uh, maybe it's a client thing though. 
Disregard if it doesn't sound salient..

thanks.. this is fun.. I hope tidy will work
Kevin


--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists