edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
* [Edbrowse-dev] <table> <form>
@ 2016-12-20 19:14 Karl Dahlke
  2016-12-21 14:01 ` Chris Brannon
  0 siblings, 1 reply; 6+ messages in thread
From: Karl Dahlke @ 2016-12-20 19:14 UTC (permalink / raw)
  To: Edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 602 bytes --]

Please look at   www.eklhad.net/nascar.html
This is a stripped down version of an unsubscribe page that doesn't work,
which is a shame cause I'd love to unsubscribe from nascar!
The problem might be tidy.
Browse it with js off and db5.
<table> <form> <tr> seems to throw it completely off the tracks.
The form is closed as soon as <tr> comes along, and all those input items aren't part of the form, including the last submit button, so you just can't do a damn thing.
The tidy team might say "It's bad html syntax" and that may be true, but we still have to parse it correctly.

Karl Dahlke

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Edbrowse-dev] <table> <form>
  2016-12-20 19:14 [Edbrowse-dev] <table> <form> Karl Dahlke
@ 2016-12-21 14:01 ` Chris Brannon
  2016-12-21 17:03   ` Geoff McLane
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Brannon @ 2016-12-21 14:01 UTC (permalink / raw)
  To: Edbrowse-dev

Karl Dahlke <eklhad@comcast.net> writes:

> Please look at   www.eklhad.net/nascar.html
> This is a stripped down version of an unsubscribe page that doesn't work,

I'm waiting a bit to see if Geoff has any input on this.  I don't know
whether he still follows this list.  If I don't hear anything in the
next few days, I'll file an issue against the tidy5 repository.
As far as I can tell, it is not valid HTML, but maybe we can get some
kind of workaround at parse time.

-- Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Edbrowse-dev] <table> <form>
  2016-12-21 14:01 ` Chris Brannon
@ 2016-12-21 17:03   ` Geoff McLane
  2016-12-22 18:35     ` Karl Dahlke
  0 siblings, 1 reply; 6+ messages in thread
From: Geoff McLane @ 2016-12-21 17:03 UTC (permalink / raw)
  To: edbrowse-dev, Karl Dahlke, Chris Brannon

Hi Karl, Chris,

 > Please look at www.eklhad.net/nascar.html

Yes, still casually follow the list, but do
not always find time to run a test... unless
you poke me, like now ;=))

And yes, tidy will see that as invalid html!

With an error even, so no output unless forced,
but IIRC you do add force-output...

But even if you do that, tidy will close the
form, move the script out of the table, and
thus the submit line no longer has an
associated form action...

In reading around, like here -
http://stackoverflow.com/questions/5967564/form-inside-a-table
where it says -
"You can have an entire table inside a form. You can have
a form inside a table cell. You cannot have part of a table
inside a form."

But I suppose none of this helps you have a
valid 'submit' button...

Yes, you could file a tidy issue, but not quite
sure what you would expect tidy to do in such a
case? But open to ideas...

Regards, Geoff.

PS: Been so long, seems I have even forgotten the
email and pwd I used for the list, so will add
direct cc to you both...  Maybe you could remind me...


On 21/12/16 15:01, Chris Brannon wrote:
> Karl Dahlke <eklhad@comcast.net> writes:
>
>> Please look at   www.eklhad.net/nascar.html
>> This is a stripped down version of an unsubscribe page that doesn't work,
> I'm waiting a bit to see if Geoff has any input on this.  I don't know
> whether he still follows this list.  If I don't hear anything in the
> next few days, I'll file an issue against the tidy5 repository.
> As far as I can tell, it is not valid HTML, but maybe we can get some
> kind of workaround at parse time.
>
> -- Chris
> _______________________________________________
> Edbrowse-dev mailing list
> Edbrowse-dev@lists.the-brannons.com
> http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Edbrowse-dev]  <table> <form>
  2016-12-21 17:03   ` Geoff McLane
@ 2016-12-22 18:35     ` Karl Dahlke
  2016-12-22 20:13       ` Geoff McLane
  0 siblings, 1 reply; 6+ messages in thread
From: Karl Dahlke @ 2016-12-22 18:35 UTC (permalink / raw)
  To: ubuntu, edbrowse-dev

> And yes, tidy will see that as invalid html!

And that's fine.

> tidy will close the form, move the script out of the table,

In an ideal world, from our point of view, it would still leave the form open.
There is a </form> later on down the page.
If tidy just can't do that, I could think about postprocessing the tree,
moving the nodes to the right of the form down to children of the form,
or some such, but every time I've tried to postmuck with the tree
I've fixed one web page and broken 8 others.
So I'm not fond of going down that path.


Karl Dahlke

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Edbrowse-dev] <table> <form>
  2016-12-22 18:35     ` Karl Dahlke
@ 2016-12-22 20:13       ` Geoff McLane
  2016-12-25 12:53         ` Adam Thompson
  0 siblings, 1 reply; 6+ messages in thread
From: Geoff McLane @ 2016-12-22 20:13 UTC (permalink / raw)
  To: Karl Dahlke, edbrowse-dev

Hi Karl,

 > In an ideal world,

LOL! Well we all know that does not exist!

Tidy does leave the form open, waiting, as it
should, for a close form, but then it hits
a tr open table element, and reports -

line 5 column 1 - Warning: missing close form
before tr

It is at this point that it *must* close the
form... and carries on parsing the table
row.. etc...

And that is why tidy emits an error when it
does eventually find a close form...

I too have had the thought - does this not
tell tidy that the earlier implicit form
close it added was not right - but what can
it do about it at that stage?

 > postmuck with the tree

Yes, I hear you! That is *not* fun, and as you
point out in fixing one page, you can break so
many others...

 > Using libtidy

You know, for a long time I have wondered why
you do not write your own html parser!

Not that I particularly want you to abandon
libtidy... your participation has helped solve
some libtidy problems... and so do hope you
continue...

But like any std html browser, IE, firefox, chrome,
who-ever, you are not really interested in how
well a document is formed... browsers can just skip
over many problems...

If necessary, maybe levering code from text-based
web browsers, like Lynx, but in my experimentation
with some of these, they too can get very hairy...

It is just that once you have the html text in a
buffer, it basically consists of looking for
`<` and the `>`, with not too many exceptions...

I have done this, with reasonable success, in several
perl scripts I have written... as I am sure you
probably have... like I remember in your first perl
version...

But I understand, this is a long, LONG way around...
quite an amount of new work initially...

But libtidy is always going to give you problems
when it runs into invalid html, and its efforts
to make it valid...

Just some thoughts... Sorry, can not seem to help
more...

Regards, Geoff.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Edbrowse-dev] <table> <form>
  2016-12-22 20:13       ` Geoff McLane
@ 2016-12-25 12:53         ` Adam Thompson
  0 siblings, 0 replies; 6+ messages in thread
From: Adam Thompson @ 2016-12-25 12:53 UTC (permalink / raw)
  To: Geoff McLane; +Cc: Karl Dahlke, edbrowse-dev

[-- Attachment #1: Type: text/plain, Size: 3731 bytes --]

On Thu, Dec 22, 2016 at 09:13:32PM +0100, Geoff McLane wrote:
> > In an ideal world,
> 
> LOL! Well we all know that does not exist!

Yep that's certainly true.

> Tidy does leave the form open, waiting, as it
> should, for a close form, but then it hits
> a tr open table element, and reports -
> 
> line 5 column 1 - Warning: missing close form
> before tr
> 
> It is at this point that it *must* close the
> form... and carries on parsing the table
> row.. etc...
> 
> And that is why tidy emits an error when it
> does eventually find a close form...
> 
> I too have had the thought - does this not
> tell tidy that the earlier implicit form
> close it added was not right - but what can
> it do about it at that stage?
> 
> > postmuck with the tree
> 
> Yes, I hear you! That is *not* fun, and as you
> point out in fixing one page, you can break so
> many others...

Agreed.  The only way I can think of around this would be for tidy to keep
track of any missing close tags and then "fix" its tree once it finds the
closing tag.  This'd be messy though and fairly difficult to do well, but would
allow the forced output mode to produce complete forms etc.  That being said I'm
not sure how many pages that'd break... probably many.

> > Using libtidy
> 
> You know, for a long time I have wondered why
> you do not write your own html parser!

We had one for quite a while but it got harder to maintain as new elements
were supported and then html5 happened.

> Not that I particularly want you to abandon
> libtidy... your participation has helped solve
> some libtidy problems... and so do hope you
> continue...
> 
> But like any std html browser, IE, firefox, chrome,
> who-ever, you are not really interested in how
> well a document is formed... browsers can just skip
> over many problems...

True, but tidy can repare most of them which is very useful.  It's also
A full validating html parser which, although causing some problems with invalid
pages, gives us  support for a lot of html which'd otherwise take quite a bit of
work and maintenance.

> If necessary, maybe levering code from text-based
> web browsers, like Lynx, but in my experimentation
> with some of these, they too can get very hairy...

Yes, and adding support for dynamic page elements only makes things worse in
that regard.  In addition, just skipping over problems means one then needs to
work around them somehow.  This may take the form of ignoring them, but most of
the time, particularly with js, some sort of special casing would be required.
This is why reparing things (see my above comment) is so useful I think.

> It is just that once you have the html text in a
> buffer, it basically consists of looking for
> `<` and the `>`, with not too many exceptions...
> 
> I have done this, with reasonable success, in several
> perl scripts I have written... as I am sure you
> probably have... like I remember in your first perl
> version...
> 
> But I understand, this is a long, LONG way around...
> quite an amount of new work initially...
> 
> But libtidy is always going to give you problems
> when it runs into invalid html, and its efforts
> to make it valid...

No more problems imho than we'd experience in getting a valid node tree from
this kind of thing.  This, actually, isn't as bad as I've seen since the form is
actually closed.  I wonder if, in our case, we could detect from the tidy output
that there is actually a closing tag somewhere and then attempt to
post-process as Karl suggested (may be print a warning and then have a command
or option to disable this for pages where it breaks)?

Any thoughts?

Cheers,
Adam.

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-12-25 12:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-20 19:14 [Edbrowse-dev] <table> <form> Karl Dahlke
2016-12-21 14:01 ` Chris Brannon
2016-12-21 17:03   ` Geoff McLane
2016-12-22 18:35     ` Karl Dahlke
2016-12-22 20:13       ` Geoff McLane
2016-12-25 12:53         ` Adam Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).