edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* html scanner
@ 2022-07-20 20:45 Karl Dahlke
  2022-07-27 22:58 ` Adam Thompson
  0 siblings, 1 reply; 2+ messages in thread
From: Karl Dahlke @ 2022-07-20 20:45 UTC (permalink / raw)
  To: edbrowse-dev

Eventually we reach a tipping point.

tidy is not maintained, and projects that aren't maintained are soon not distributed.

1. People will have to build tidy from source, (once it is no longer packaged), for as long as the source remains on line.
2. Building it is a pain since you have to use cmake.
3. there are bugs in tidy we can't fix, and can't work around. At least one is an infinite loop so this is no longer a trivial matter.
4. It is yet another dependency. The fewer dependencies the better.

With this in mind, I finally said, oh fuck it, it's time to write our own.
An html scanner isn't trivial, but it's not terribly hard either,
it's not like a js engine, which is, for us, impossible!
So I've spent three days on it, and it's pretty dog gone close to done.
html-tags.c
Just three days, why didn't we do this sooner?
And it's only a little more code than the code we used to interface to tidy.
No kidding - for the same amount of code we can roll our own.

So here's how to use it.
There is a temporary edbrowse toggle command
tidy
So you can use tidy or not, and even compare the outputs.
Our users guide is almost 500 lines long when rendered, and it comes out the same either way, that's pretty good.
jsrt also comes out the same, though there are some issues when trying to use it.
4 of the tests in acid3 fail using my scanner.
So sure there are still issues, but this is clearly the way to go.

I'd like to have this working solid, maybe in a month, then divest from tidy, then cut version 3.8.3
We will, at that time, update our installation procedures.

So if you dare, type in tidy, then browse around like usual, and see if things blow up, or look wrong, etc.
If you're not sure, revert back to tidy and browse and compare.

Karl Dahlke


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-07-27 22:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-20 20:45 html scanner Karl Dahlke
2022-07-27 22:58 ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).