edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
From: Adam Thompson <arthompson1990@gmail.com>
To: Karl Dahlke <eklhad@comcast.net>
Cc: edbrowse-dev@edbrowse.org
Subject: Re: html scanner
Date: Wed, 27 Jul 2022 23:58:23 +0100	[thread overview]
Message-ID: <YuHDD9H4vtge0DxY@pinebook-pro> (raw)
In-Reply-To: <20220620164559.eklhad@comcast.net>

On Wed, Jul 20, 2022 at 04:45:59PM -0400, Karl Dahlke wrote:
> Eventually we reach a tipping point.
> tidy is not maintained, and projects that aren't maintained are soon not distributed.
> 1. People will have to build tidy from source, (once it is no longer packaged), for as long as the source remains on line.
> 2. Building it is a pain since you have to use cmake.
> 3. there are bugs in tidy we can't fix, and can't work around. At least one is an infinite loop so this is no longer a trivial matter.
> 4. It is yet another dependency. The fewer dependencies the better.

It's a shame that tidy's got into such a state but yeah, we can't depend on
unmaintained code, that's not sustainable.

> With this in mind, I finally said, oh fuck it, it's time to write our own.
> An html scanner isn't trivial, but it's not terribly hard either,
> it's not like a js engine, which is, for us, impossible!
> So I've spent three days on it, and it's pretty dog gone close to done.
> html-tags.c
> Just three days, why didn't we do this sooner?

Probably because we had our own before and switched from it. Fortunately
we've learned from that and this would appear to be the result which sounds
like sustainable progress.

> And it's only a little more code than the code we used to interface to tidy.
> No kidding - for the same amount of code we can roll our own.

It'll also allow us to remain current with the ever-changing world of the
web, at least in terms of new html tags hopefully.

> So here's how to use it.
> There is a temporary edbrowse toggle command
> tidy
> So you can use tidy or not, and even compare the outputs.
> Our users guide is almost 500 lines long when rendered, and it comes out the same either way, that's pretty good.
> jsrt also comes out the same, though there are some issues when trying to use it.
> 4 of the tests in acid3 fail using my scanner.
> So sure there are still issues, but this is clearly the way to go.

Agreed, will start battle-testing.

> I'd like to have this working solid, maybe in a month, then divest from tidy, then cut version 3.8.3
> We will, at that time, update our installation procedures.

Sounds like a solid plan.

> So if you dare, type in tidy, then browse around like usual, and see if things blow up, or look wrong, etc.
> If you're not sure, revert back to tidy and browse and compare.

Sounds fun.


      reply	other threads:[~2022-07-27 22:55 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-20 20:45 Karl Dahlke
2022-07-27 22:58 ` Adam Thompson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuHDD9H4vtge0DxY@pinebook-pro \
    --to=arthompson1990@gmail.com \
    --cc=edbrowse-dev@edbrowse.org \
    --cc=eklhad@comcast.net \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).