* [Edbrowse-dev] <table> <form> @ 2016-12-20 19:14 Karl Dahlke 2016-12-21 14:01 ` Chris Brannon 0 siblings, 1 reply; 6+ messages in thread From: Karl Dahlke @ 2016-12-20 19:14 UTC (permalink / raw) To: Edbrowse-dev [-- Attachment #1: Type: text/plain, Size: 602 bytes --] Please look at www.eklhad.net/nascar.html This is a stripped down version of an unsubscribe page that doesn't work, which is a shame cause I'd love to unsubscribe from nascar! The problem might be tidy. Browse it with js off and db5. <table> <form> <tr> seems to throw it completely off the tracks. The form is closed as soon as <tr> comes along, and all those input items aren't part of the form, including the last submit button, so you just can't do a damn thing. The tidy team might say "It's bad html syntax" and that may be true, but we still have to parse it correctly. Karl Dahlke ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Edbrowse-dev] <table> <form> 2016-12-20 19:14 [Edbrowse-dev] <table> <form> Karl Dahlke @ 2016-12-21 14:01 ` Chris Brannon 2016-12-21 17:03 ` Geoff McLane 0 siblings, 1 reply; 6+ messages in thread From: Chris Brannon @ 2016-12-21 14:01 UTC (permalink / raw) To: Edbrowse-dev Karl Dahlke <eklhad@comcast.net> writes: > Please look at www.eklhad.net/nascar.html > This is a stripped down version of an unsubscribe page that doesn't work, I'm waiting a bit to see if Geoff has any input on this. I don't know whether he still follows this list. If I don't hear anything in the next few days, I'll file an issue against the tidy5 repository. As far as I can tell, it is not valid HTML, but maybe we can get some kind of workaround at parse time. -- Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Edbrowse-dev] <table> <form> 2016-12-21 14:01 ` Chris Brannon @ 2016-12-21 17:03 ` Geoff McLane 2016-12-22 18:35 ` Karl Dahlke 0 siblings, 1 reply; 6+ messages in thread From: Geoff McLane @ 2016-12-21 17:03 UTC (permalink / raw) To: edbrowse-dev, Karl Dahlke, Chris Brannon Hi Karl, Chris, > Please look at www.eklhad.net/nascar.html Yes, still casually follow the list, but do not always find time to run a test... unless you poke me, like now ;=)) And yes, tidy will see that as invalid html! With an error even, so no output unless forced, but IIRC you do add force-output... But even if you do that, tidy will close the form, move the script out of the table, and thus the submit line no longer has an associated form action... In reading around, like here - http://stackoverflow.com/questions/5967564/form-inside-a-table where it says - "You can have an entire table inside a form. You can have a form inside a table cell. You cannot have part of a table inside a form." But I suppose none of this helps you have a valid 'submit' button... Yes, you could file a tidy issue, but not quite sure what you would expect tidy to do in such a case? But open to ideas... Regards, Geoff. PS: Been so long, seems I have even forgotten the email and pwd I used for the list, so will add direct cc to you both... Maybe you could remind me... On 21/12/16 15:01, Chris Brannon wrote: > Karl Dahlke <eklhad@comcast.net> writes: > >> Please look at www.eklhad.net/nascar.html >> This is a stripped down version of an unsubscribe page that doesn't work, > I'm waiting a bit to see if Geoff has any input on this. I don't know > whether he still follows this list. If I don't hear anything in the > next few days, I'll file an issue against the tidy5 repository. > As far as I can tell, it is not valid HTML, but maybe we can get some > kind of workaround at parse time. > > -- Chris > _______________________________________________ > Edbrowse-dev mailing list > Edbrowse-dev@lists.the-brannons.com > http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Edbrowse-dev] <table> <form> 2016-12-21 17:03 ` Geoff McLane @ 2016-12-22 18:35 ` Karl Dahlke 2016-12-22 20:13 ` Geoff McLane 0 siblings, 1 reply; 6+ messages in thread From: Karl Dahlke @ 2016-12-22 18:35 UTC (permalink / raw) To: ubuntu, edbrowse-dev > And yes, tidy will see that as invalid html! And that's fine. > tidy will close the form, move the script out of the table, In an ideal world, from our point of view, it would still leave the form open. There is a </form> later on down the page. If tidy just can't do that, I could think about postprocessing the tree, moving the nodes to the right of the form down to children of the form, or some such, but every time I've tried to postmuck with the tree I've fixed one web page and broken 8 others. So I'm not fond of going down that path. Karl Dahlke ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Edbrowse-dev] <table> <form> 2016-12-22 18:35 ` Karl Dahlke @ 2016-12-22 20:13 ` Geoff McLane 2016-12-25 12:53 ` Adam Thompson 0 siblings, 1 reply; 6+ messages in thread From: Geoff McLane @ 2016-12-22 20:13 UTC (permalink / raw) To: Karl Dahlke, edbrowse-dev Hi Karl, > In an ideal world, LOL! Well we all know that does not exist! Tidy does leave the form open, waiting, as it should, for a close form, but then it hits a tr open table element, and reports - line 5 column 1 - Warning: missing close form before tr It is at this point that it *must* close the form... and carries on parsing the table row.. etc... And that is why tidy emits an error when it does eventually find a close form... I too have had the thought - does this not tell tidy that the earlier implicit form close it added was not right - but what can it do about it at that stage? > postmuck with the tree Yes, I hear you! That is *not* fun, and as you point out in fixing one page, you can break so many others... > Using libtidy You know, for a long time I have wondered why you do not write your own html parser! Not that I particularly want you to abandon libtidy... your participation has helped solve some libtidy problems... and so do hope you continue... But like any std html browser, IE, firefox, chrome, who-ever, you are not really interested in how well a document is formed... browsers can just skip over many problems... If necessary, maybe levering code from text-based web browsers, like Lynx, but in my experimentation with some of these, they too can get very hairy... It is just that once you have the html text in a buffer, it basically consists of looking for `<` and the `>`, with not too many exceptions... I have done this, with reasonable success, in several perl scripts I have written... as I am sure you probably have... like I remember in your first perl version... But I understand, this is a long, LONG way around... quite an amount of new work initially... But libtidy is always going to give you problems when it runs into invalid html, and its efforts to make it valid... Just some thoughts... Sorry, can not seem to help more... Regards, Geoff. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Edbrowse-dev] <table> <form> 2016-12-22 20:13 ` Geoff McLane @ 2016-12-25 12:53 ` Adam Thompson 0 siblings, 0 replies; 6+ messages in thread From: Adam Thompson @ 2016-12-25 12:53 UTC (permalink / raw) To: Geoff McLane; +Cc: Karl Dahlke, edbrowse-dev [-- Attachment #1: Type: text/plain, Size: 3731 bytes --] On Thu, Dec 22, 2016 at 09:13:32PM +0100, Geoff McLane wrote: > > In an ideal world, > > LOL! Well we all know that does not exist! Yep that's certainly true. > Tidy does leave the form open, waiting, as it > should, for a close form, but then it hits > a tr open table element, and reports - > > line 5 column 1 - Warning: missing close form > before tr > > It is at this point that it *must* close the > form... and carries on parsing the table > row.. etc... > > And that is why tidy emits an error when it > does eventually find a close form... > > I too have had the thought - does this not > tell tidy that the earlier implicit form > close it added was not right - but what can > it do about it at that stage? > > > postmuck with the tree > > Yes, I hear you! That is *not* fun, and as you > point out in fixing one page, you can break so > many others... Agreed. The only way I can think of around this would be for tidy to keep track of any missing close tags and then "fix" its tree once it finds the closing tag. This'd be messy though and fairly difficult to do well, but would allow the forced output mode to produce complete forms etc. That being said I'm not sure how many pages that'd break... probably many. > > Using libtidy > > You know, for a long time I have wondered why > you do not write your own html parser! We had one for quite a while but it got harder to maintain as new elements were supported and then html5 happened. > Not that I particularly want you to abandon > libtidy... your participation has helped solve > some libtidy problems... and so do hope you > continue... > > But like any std html browser, IE, firefox, chrome, > who-ever, you are not really interested in how > well a document is formed... browsers can just skip > over many problems... True, but tidy can repare most of them which is very useful. It's also A full validating html parser which, although causing some problems with invalid pages, gives us support for a lot of html which'd otherwise take quite a bit of work and maintenance. > If necessary, maybe levering code from text-based > web browsers, like Lynx, but in my experimentation > with some of these, they too can get very hairy... Yes, and adding support for dynamic page elements only makes things worse in that regard. In addition, just skipping over problems means one then needs to work around them somehow. This may take the form of ignoring them, but most of the time, particularly with js, some sort of special casing would be required. This is why reparing things (see my above comment) is so useful I think. > It is just that once you have the html text in a > buffer, it basically consists of looking for > `<` and the `>`, with not too many exceptions... > > I have done this, with reasonable success, in several > perl scripts I have written... as I am sure you > probably have... like I remember in your first perl > version... > > But I understand, this is a long, LONG way around... > quite an amount of new work initially... > > But libtidy is always going to give you problems > when it runs into invalid html, and its efforts > to make it valid... No more problems imho than we'd experience in getting a valid node tree from this kind of thing. This, actually, isn't as bad as I've seen since the form is actually closed. I wonder if, in our case, we could detect from the tidy output that there is actually a closing tag somewhere and then attempt to post-process as Karl suggested (may be print a warning and then have a command or option to disable this for pages where it breaks)? Any thoughts? Cheers, Adam. [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-12-25 12:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-12-20 19:14 [Edbrowse-dev] <table> <form> Karl Dahlke 2016-12-21 14:01 ` Chris Brannon 2016-12-21 17:03 ` Geoff McLane 2016-12-22 18:35 ` Karl Dahlke 2016-12-22 20:13 ` Geoff McLane 2016-12-25 12:53 ` Adam Thompson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).