edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
From: Kevin Carhart <kevin@carhart.net>
To: Karl Dahlke <eklhad@comcast.net>
Cc: edbrowse-dev@lists.the-brannons.com
Subject: Re: [Edbrowse-dev] script tags in scripts
Date: Thu, 10 Sep 2015 22:28:03 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LRH.2.03.1509102200340.19704@carhart.net> (raw)
In-Reply-To: <20150810211023.eklhad@comcast.net>



Interesting.. Karl, does your certainty mean that you are saying
that the distinction between the two tags is fundamentally
unknowable for a parser?

I guess one good sign is that there appears to be a lot of
past literature on this issue, on Tidy listservs.  Including
one from 2006 called "Tidy barfs on split <SCRIPT> tags".
Unless it's an impossible problem, maybe these past threads
will contain something we can use.  I will read some of this
correspondence.

This reminds me of other gnarly situations with literals.
For instance, when there are regular expression criteria in
javascript strings that contain just solely a close brace or close
parenthesis, if I come along and want to make
assumptions about pairs of braces, the unmatched literal gets me
out of sync.

Kevin


On Thu, 10 Sep 2015, Karl Dahlke wrote:

> I'm fairly certain, and fairly concerned, that this is a tidy bug
> that we can't get around.
> Source as follows.
>
> <body>
> <script>document.write("<script></s");document.write("cript>")</script>
> <p>paragraph</p>
> </body>
>
> db6
> js
> b
>
> undoCompare no undo map
> line 1 column 1: missing <!DOCTYPE> declaration
> line 2 column 34: '<' + '/' + letter not allowed here
> line 2 column 69: '<' + '/' + letter not allowed here
> line 3 column 14: '<' + '/' + letter not allowed here
> line 4 column 5: '<' + '/' + letter not allowed here
> line 2 column 1: missing </script>
> line 2 column 1: missing </script>
> line 1 column 1: inserting missing 'title' element
> Node(0): Root {
> Node(1): DOCTYPE {
> @PUBLIC = (null)
> }
> Node(1): html {
> Node(2): head {
> Node(3): meta {
> @name = generator
> @content = HTML Tidy for HTML5 for Linux/x86 version 5.1.2
> }
> Node(3): title {
> }
> }
> Node(2): body {
> Node(3): script {
> Node(4): Text {
> Text: document.write("<script><\/s");document.write("cript>")<\/script>
> <p>paragraph<\/p>
> <\/body>
>
> }
> }
> }
> }
> }
> ||
>
> So you see all the text is subsumed under the script tag.
> And slashes are escaped.
> Tidy doesn't grasp the </script> terminater.
> Thoughts?
>
> Karl Dahlke
> _______________________________________________
> Edbrowse-dev mailing list
> Edbrowse-dev@lists.the-brannons.com
> http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev
>

--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists

  reply	other threads:[~2015-09-11  5:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11  0:17 Tyler Spivey
2015-09-11  1:10 ` Karl Dahlke
2015-09-11  5:28   ` Kevin Carhart [this message]
2015-09-11  7:39     ` Adam Thompson
2015-09-11 10:17       ` Karl Dahlke
2015-09-11 18:02         ` Adam Thompson
2015-09-11 18:55           ` Karl Dahlke
2015-09-11 16:37     ` Chris Brannon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.03.1509102200340.19704@carhart.net \
    --to=kevin@carhart.net \
    --cc=edbrowse-dev@lists.the-brannons.com \
    --cc=eklhad@comcast.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).