From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: * X-Spam-Status: No, score=1.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HDRS_MISSP autolearn=no autolearn_force=no version=3.4.4 Received: (qmail 19384 invoked from network); 20 Jul 2022 20:46:39 -0000 Received: from hurricane.the-brannons.com (2602:ff06:725:1:20::25) by inbox.vuxu.org with ESMTPUTF8; 20 Jul 2022 20:46:39 -0000 Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by hurricane.the-brannons.com (OpenSMTPD) with ESMTP id 1ad97a24 for ; Wed, 20 Jul 2022 13:46:34 -0700 (PDT) Received: from resqmta-a1p-077436.sys.comcast.net (resqmta-a1p-077436.sys.comcast.net [2001:558:fd01:2bb4::2]) by hurricane.the-brannons.com (OpenSMTPD) with ESMTPS id 4b092eb7 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO) for ; Wed, 20 Jul 2022 13:46:29 -0700 (PDT) Received: from resomta-a1p-076786.sys.comcast.net ([96.103.145.235]) by resqmta-a1p-077436.sys.comcast.net with ESMTP id E8qJonvxN9T68EGaRoVo7F; Wed, 20 Jul 2022 20:46:27 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=20190202a; t=1658349987; bh=0m7fQjqyZ40YQxbA0j226rfS8Z9BWRw/QDDgHJOLs8U=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=4WaTVotYVHARuDlIlRUMxFLkr0aj6snK5UTRXx2Rwcl1x/QsRSv12AZGubsd125YO RU/iJ9zP8B/3WNoZBMMexuyhEq4pdEJ2vgwQ9rLzPP0m97rLjE5/QHqEyrSmBDqQTx qQywT3TCl6l7ePQXtaNmFhDD+Nl4Wolp0u1ZRt4nAUTd5wAppUwm3RSPaep8LCxRZi c9hZgVvO3bGwXTWBv0gWB/UVt7/t8cZPDUaNeHbjlL2+SQwffBe7oMJlRKR4DfsgaH R4RIt7kkTqkU+teigiYpyodbfkHDZgsF/Q9sqUmEi1flQW8XIVYQEj+g/sQGk3dKmQ U18xBJKdCrpsA== Received: from unknown ([IPv6:2601:408:c001:30:51c5:38ba:21aa:f701]) by resomta-a1p-076786.sys.comcast.net with ESMTPSA id EGZzofBOhWXBOEGa4olKlW; Wed, 20 Jul 2022 20:46:05 +0000 X-Xfinity-VMeta: sc=0.00;st=legit To:edbrowse-dev@edbrowse.org From: Karl Dahlke Reply-to: Karl Dahlke User-Agent: edbrowse/3.8.2.1+ Subject: html scanner Date: Wed, 20 Jul 2022 16:45:59 -0400 Message-ID: <20220620164559.eklhad@comcast.net> X-BeenThere: edbrowse-dev@edbrowse.org List-Id: Edbrowse Development List Mime-Version: 1.0 Content-Type: text/plain; format=flowed Content-Transfer-Encoding: 7bit Eventually we reach a tipping point. tidy is not maintained, and projects that aren't maintained are soon not distributed. 1. People will have to build tidy from source, (once it is no longer packaged), for as long as the source remains on line. 2. Building it is a pain since you have to use cmake. 3. there are bugs in tidy we can't fix, and can't work around. At least one is an infinite loop so this is no longer a trivial matter. 4. It is yet another dependency. The fewer dependencies the better. With this in mind, I finally said, oh fuck it, it's time to write our own. An html scanner isn't trivial, but it's not terribly hard either, it's not like a js engine, which is, for us, impossible! So I've spent three days on it, and it's pretty dog gone close to done. html-tags.c Just three days, why didn't we do this sooner? And it's only a little more code than the code we used to interface to tidy. No kidding - for the same amount of code we can roll our own. So here's how to use it. There is a temporary edbrowse toggle command tidy So you can use tidy or not, and even compare the outputs. Our users guide is almost 500 lines long when rendered, and it comes out the same either way, that's pretty good. jsrt also comes out the same, though there are some issues when trying to use it. 4 of the tests in acid3 fail using my scanner. So sure there are still issues, but this is clearly the way to go. I'd like to have this working solid, maybe in a month, then divest from tidy, then cut version 3.8.3 We will, at that time, update our installation procedures. So if you dare, type in tidy, then browse around like usual, and see if things blow up, or look wrong, etc. If you're not sure, revert back to tidy and browse and compare. Karl Dahlke