* [Edbrowse-dev] tidy script bug workaround
@ 2015-09-13 1:22 Karl Dahlke
2015-09-13 8:17 ` Chris Brannon
0 siblings, 1 reply; 7+ messages in thread
From: Karl Dahlke @ 2015-09-13 1:22 UTC (permalink / raw)
To: Edbrowse-dev
Well if they've been talking about this bug since 2007,
then they might not fix it soon.
With that in mind I was thinking about a preprocessing workaround.
My first attempt looked for strings within scripts,
and then <script within a string,
but that was completely derailed by
if(whatever.match(/"/)) { do stuff }
The bare regexps make this approach impossible.
So I made the routine even simpler, and I think safer and better.
If it flags a false positive, then the script won't compile and won't run,
which is better than running and doing the wrong thing.
Normally I just push stuff without review,
in the interest of getting things done,
which is perhaps arrogant, and I apologize for that,
but this one I think people should look at first.
It is ready to push if you say go.
The new string is longer than the original,
so I have to use all those dynamic string functions.
/* Work around a nasty bug in tidy5 wherein "<script>" anywhere
* in a javascript will totally derail things.
* I turn < into \x3c. */
static char *escapeLessScript(const char *htmltext)
{
char *ns; /* new string */
int ns_l;
const char *s1, *s2; /* start and end of script */
const char *lw; /* last write */
const char *q; /* inner script */
ns = initString(&ns_l);
lw = htmltext;
while (true) {
s1 = strstrCI(lw, "<script");
if (!s1)
break;
// printf("@@%s", s1);
s1 += 7;
if (isalnumByte(*s1)) { /* <scriptx */
stringAndBytes(&ns, &ns_l, lw, s1 - lw);
lw = s1;
continue;
}
s2 = strstrCI(s1, "</script");
if (!s2)
goto abort;
/* script now has a start and end */
stringAndBytes(&ns, &ns_l, lw, s1 - lw);
lw = s1;
while (true) {
q = strstrCI(lw, "<script");
if (!q || q > s2)
break;
stringAndBytes(&ns, &ns_l, lw, q - lw);
stringAndString(&ns, &ns_l, "\\x3c");
lw = q + 1;
}
stringAndBytes(&ns, &ns_l, lw, s2 - lw);
lw = s2;
}
stringAndString(&ns, &ns_l, lw);
return ns;
abort:
nzFree(ns);
return 0;
} /* escapeLessScript */
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Edbrowse-dev] tidy script bug workaround
2015-09-13 1:22 [Edbrowse-dev] tidy script bug workaround Karl Dahlke
@ 2015-09-13 8:17 ` Chris Brannon
2015-09-13 14:46 ` Karl Dahlke
0 siblings, 1 reply; 7+ messages in thread
From: Chris Brannon @ 2015-09-13 8:17 UTC (permalink / raw)
To: Edbrowse-dev
Karl Dahlke <eklhad@comcast.net> writes:
> Well if they've been talking about this bug since 2007,
> then they might not fix it soon.
Yeah, but on the other hand, some of that discussion was from the
original tidy project, which has been more-or-less dead for some years
now. Tidy 5 is new enough that I'm hopeful.
Seems that Kevin has been interacting with their issue tracker and
possibly their team; I'm happy to do what I can to help on that front also.
> but this one I think people should look at first.
> It is ready to push if you say go.
I'm giving this a very tentative go. I read through it a couple times.
I don't see anything obviously wrong, but yeah, it's tricky.
PS. I'm seeing more and more examples of this in the wild, including
one on a BBC news site.
-- Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Edbrowse-dev] tidy script bug workaround
2015-09-13 8:17 ` Chris Brannon
@ 2015-09-13 14:46 ` Karl Dahlke
2015-09-13 16:37 ` Adam Thompson
0 siblings, 1 reply; 7+ messages in thread
From: Karl Dahlke @ 2015-09-13 14:46 UTC (permalink / raw)
To: Edbrowse-dev
> I don't see anything obviously wrong, but yeah, it's tricky.
Ok, it's pushed, try it on your various examples.
> One of us should get on there, let them know about us, and see if we can
> get something to happen. I'm glad to do it unless someone else feels
> more diplomatic.
No one is more diplomatic than you.
Karl Dahlke
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Edbrowse-dev] tidy script bug workaround
2015-09-13 14:46 ` Karl Dahlke
@ 2015-09-13 16:37 ` Adam Thompson
2015-09-13 17:05 ` Karl Dahlke
0 siblings, 1 reply; 7+ messages in thread
From: Adam Thompson @ 2015-09-13 16:37 UTC (permalink / raw)
To: Karl Dahlke; +Cc: Edbrowse-dev
[-- Attachment #1: Type: text/plain, Size: 263 bytes --]
On Sun, Sep 13, 2015 at 10:46:02AM -0400, Karl Dahlke wrote:
> > I don't see anything obviously wrong, but yeah, it's tricky.
>
> Ok, it's pushed, try it on your various examples.
Doesn't this miss the also destructive </script problem?
Cheers,
Adam.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Edbrowse-dev] tidy script bug workaround
2015-09-13 16:37 ` Adam Thompson
@ 2015-09-13 17:05 ` Karl Dahlke
2015-09-14 17:05 ` Adam Thompson
0 siblings, 1 reply; 7+ messages in thread
From: Karl Dahlke @ 2015-09-13 17:05 UTC (permalink / raw)
To: Edbrowse-dev
> Doesn't this miss the also destructive </script problem?
Yes it does, but "</script>" does not seem to appear in the wild.
In other words, web developers and generators are careful not to crank out
the string "...</script>..." because </script> anywhere ends the script.
My parser has made this assumption for ten years, and it's pretty reliable.
They're more than happy to write
var a = "<script>";
but not
var a = "</script>";
the latter often written as
var a = "</scr" + "ipt>";
This too generates a tidy warning, because </s shouldn't appear,
or </ any letter for that matter,
but it does not cause trouble and does not derail the script.
So my early research suggests we're ok here.
Karl Dahlke
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Edbrowse-dev] tidy script bug workaround
2015-09-13 17:05 ` Karl Dahlke
@ 2015-09-14 17:05 ` Adam Thompson
2015-09-14 18:37 ` Chris Brannon
0 siblings, 1 reply; 7+ messages in thread
From: Adam Thompson @ 2015-09-14 17:05 UTC (permalink / raw)
To: Karl Dahlke; +Cc: Edbrowse-dev
[-- Attachment #1: Type: text/plain, Size: 992 bytes --]
On Sun, Sep 13, 2015 at 01:05:45PM -0400, Karl Dahlke wrote:
> > Doesn't this miss the also destructive </script problem?
>
> Yes it does, but "</script>" does not seem to appear in the wild.
> In other words, web developers and generators are careful not to crank out
> the string "...</script>..." because </script> anywhere ends the script.
> My parser has made this assumption for ten years, and it's pretty reliable.
>
> They're more than happy to write
> var a = "<script>";
> but not
> var a = "</script>";
> the latter often written as
> var a = "</scr" + "ipt>";
> This too generates a tidy warning, because </s shouldn't appear,
> or </ any letter for that matter,
> but it does not cause trouble and does not derail the script.
> So my early research suggests we're ok here.
Ok that makes sense. In which case I'm unclear as to why tidy5 wouldn't handle
<script inside a script tag. Has anyone posted to their mailing list yet about this?
Cheers,
Adam.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Edbrowse-dev] tidy script bug workaround
2015-09-14 17:05 ` Adam Thompson
@ 2015-09-14 18:37 ` Chris Brannon
0 siblings, 0 replies; 7+ messages in thread
From: Chris Brannon @ 2015-09-14 18:37 UTC (permalink / raw)
To: Edbrowse-dev
Adam Thompson <arthompson1990@gmail.com> writes:
> Has anyone posted to their mailing list yet about this?
Yes, I have.
I haven't heard back from them yet.
-- Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-09-14 18:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-13 1:22 [Edbrowse-dev] tidy script bug workaround Karl Dahlke
2015-09-13 8:17 ` Chris Brannon
2015-09-13 14:46 ` Karl Dahlke
2015-09-13 16:37 ` Adam Thompson
2015-09-13 17:05 ` Karl Dahlke
2015-09-14 17:05 ` Adam Thompson
2015-09-14 18:37 ` Chris Brannon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).