On Fri, Sep 11, 2015 at 06:17:13AM -0400, Karl Dahlke wrote: > > I'm not sure what we can do about this, > > but I'm inclined to think that whatever we do won't catch every case and that > > at some stage we have to accept that and move on. > > That was true of my parser, true of tidy5, and true of any parser, > however, as you point out regularly, we should handle most websites > that other browsers handle. > And when we don't, > entire web pages shouldn't disappear beyond the point of error. > This bug is produced by fanfiction.net and fictionpress.com, > two high volume sites that work on every other browser. Agreed, we need to work out what's breaking here and why it's affecting tidy5 and not, say, firefox etc. I may try the pages with some other html parsing libs (not applicable to edbrowse unfortunately as they're in, e.g. Python or Perl) to see what they do with the pages. I'm just saying that I think we should continue to move forward with the design on the basis that tidy5 will be fixed. If it's not then we'll need to look at other alternatives but there're a lot of elements of the new design which should stay in any case I think. > And by the way, my thanks to those users who exercise and test our bleeding edge software; > you're as brave as a Windows 10 insider. I second this. We need users to test this software and I appreciate the time and effort it takes to keep on top of the latest code, particularly when we're adding library dependancies. > In any case, tidy5 needs to fix this, > or we need to find a way to preprocess around it, > the latter meaning I'd have to keep at least half of my parser, > which I really wanted to throw away entirely. :( May be, or we keep the tidy-inspired design but rewrite the parsing logic, may be borrowing the parsing code from somewhere else and making it our own. I know I said we should try and stay out of the html parsing business, and I still would like to ideally but if we really can't then we can at least keep the current design direction. There has to be a parsing lib out there somewhere which works properly... at least I hope there is. Cheers, Adam.