From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from resqmta-ch2-08v.sys.comcast.net (resqmta-ch2-08v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:40]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 9298377AA9 for ; Sat, 29 Aug 2015 03:01:54 -0700 (PDT) Received: from resomta-ch2-12v.sys.comcast.net ([69.252.207.108]) by resqmta-ch2-08v.sys.comcast.net with comcast id Aa451r0012LrikM01a45Lu; Sat, 29 Aug 2015 10:04:05 +0000 Received: from eklhad ([IPv6:2601:405:4002:b0a:21e:4fff:fec2:a0f1]) by resomta-ch2-12v.sys.comcast.net with comcast id Aa451r0010GArqr01a45jx; Sat, 29 Aug 2015 10:04:05 +0000 To: Edbrowse-dev@lists.the-brannons.com From: Karl Dahlke Reply-to: Karl Dahlke User-Agent: edbrowse/3.5.4.2+ Date: Sat, 29 Aug 2015 06:04:04 -0400 Message-ID: <20150729060404.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1440842645; bh=dzC6C1IVd0DyPW2NJvzbNw3xTLP2x4omEQr6iU7fwkg=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=YiIQ33/OUCmGwEyzYOfdc15o3mDZHnT3iyMqgr56oFLrOsN7WqkI8sz0g3W5cnV+9 C5Kui3O/9/VgZZUwOB+XrZ9m0fBr5QxCfDuBQP0tM/wjifU4MB7vpt/Zf701VTID8g RmTnydfy8DfYNmesxdvXS6UFSegW2L126E7CyPMReS4C0mQi76VFT4sT3H7pd4pJOT L4GzWJ1c7lsuBe2tX8vHpKLmWOBwVGChK5mkryITaJgQq6Jym0ieLJxmXQVUb2lLXH KOQGu37IzlVv30j65DE1wO9Zc0f7TUNCK6m486nLxayE6AY8jeG9iJgEy0PylMDs7b 42OiLwqlEaHaA== Subject: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Aug 2015 10:01:55 -0000 Debug prints are in, and seem to work. Thanks Kevin. Here is my test page, that I was worried about. jf test hello world I browse with db6, there is lots of js debugging, I'll leave that out, here are the relevant lines. line 7 column 67: '<' + '/' + letter not allowed here Node(0): Text Text: hello world Node(0): script type = text/javascript Node(1): Text Text: document.writeln("This is <A href=http://edbrowse.org>our website<\/A>"); # end of tidy debug output, next stuff is ours execute jf at 6 < side effects w{This is our website `~@} < ok execution complete docwrite 62 bytes << This is our website >> anchorSwap 4 anchors unframed whitespace combined Right off the bat I'm concerned becausee tidy shows an error where there is no error. It is trying to interpret the tag in the string, in the script, and it shouldn't be doing that at all. Next I look at the text node under the script, the text that is to be passed to the js engine, and it has been html escaped. is now <a> Why? That would totally screw things up. Is it escaped and interpreted for the benefit of printing, for us, or is it done by cleanup? If the latter then we can't use tidy5 unless this is fixed. This is a show stopper. They can't be mucking with the contents of a js script at all. In fact they shouldn't muck with the contents of any script. Karl Dahlke From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com [IPv6:2a00:1450:400c:c05::22d]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 742BA77D0D for ; Sat, 29 Aug 2015 06:23:08 -0700 (PDT) Received: by wicne3 with SMTP id ne3so38836101wic.0 for ; Sat, 29 Aug 2015 06:25:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=y0dN8vch7wdmV2J52LK8n7/UZ4JQmpIs9Mc3GgK0LPc=; b=tZU3fUS1YYEpMxO6w2Ft0mMmhfe1Wn+bVsZGHuuicCEqXjhj8xDXcZ34vMAAIvElVn R7kTXiaGm5pDZgS7qGX/A2At6+w0JyZN/VBkcYzW22iZIeqTuZlXsF1ycMHFzAm3As6u pJ1pmTOuiJEGCnxJrNLgakHveSL47ulHHg4mqLbZiwfl5x3AQaR2TLRVdKWqB6P3+jOr FaNKIa46MrKKsAZNJWi0xjErAiBoCvHqhNzaRhflCS1fjJ2qjqj99BmAbRGSw2j3ym2M /JWdnCWo8iMB/x1oMW0R497OEt56B+Xju1way6S+6Z1wK6TP2T8q+ClgQzVa4I63gFy5 OjkQ== X-Received: by 10.194.85.130 with SMTP id h2mr18370633wjz.2.1440854718948; Sat, 29 Aug 2015 06:25:18 -0700 (PDT) Received: from toaster.adamthompson.me.uk (toaster.adamthompson.me.uk. [2001:8b0:1142:9042::2]) by smtp.gmail.com with ESMTPSA id go5sm8619681wib.5.2015.08.29.06.25.17 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 29 Aug 2015 06:25:18 -0700 (PDT) Date: Sat, 29 Aug 2015 14:25:16 +0100 From: Adam Thompson To: Karl Dahlke Cc: Edbrowse-dev@lists.the-brannons.com Message-ID: <20150829132516.GD31434@toaster.adamthompson.me.uk> References: <20150729060404.eklhad@comcast.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="iVCmgExH7+hIHJ1A" Content-Disposition: inline In-Reply-To: <20150729060404.eklhad@comcast.net> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Aug 2015 13:23:08 -0000 --iVCmgExH7+hIHJ1A Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Aug 29, 2015 at 06:04:04AM -0400, Karl Dahlke wrote: > Debug prints are in, and seem to work. > Thanks Kevin. Well done all for the work, appologies for being somewhat inactive recently= but I've been busy with the day job and simply haven't had any time to do anyth= ing computing related outside of the office. > Here is my test page, that I was worried about. >=20 > > > jf test > > hello world > > >=20 > I browse with db6, there is lots of js debugging, > I'll leave that out, here are the relevant lines. >=20 > line 7 column 67: '<' + '/' + letter not allowed here > Node(0): Text > Text: hello world=20 > Node(0): script > type =3D text/javascript > Node(1): Text > Text: document.writeln("This is <A href=3Dhttp://edbrowse.org>our > website<\/A>"); > # end of tidy debug output, next stuff is ours > execute jf at 6 > < side effects > w{This is our website > `~@} > < ok > execution complete > docwrite 62 bytes > << > This is our website > >> > anchorSwap 4 > anchors unframed > whitespace combined >=20 > Right off the bat I'm concerned becausee tidy shows an error > where there is no error. > It is trying to interpret the tag in the string, in the script, > and it shouldn't be doing that at all. Actually, yes it should. This is one of the corner cases with html; everything within a script tag is not parsed except the sequence Next I look at the text node under the script, > the text that is to be passed to the js engine, and it has been html esca= ped. > is now <a> > Why? > That would totally screw things up. > Is it escaped and interpreted for the benefit of printing, for us, > or is it done by cleanup? > If the latter then we can't use tidy5 unless this is fixed. > This is a show stopper. > They can't be mucking with the contents of a js script at all. > In fact they shouldn't muck with the contents of any script. Actually, see above for how script tags should behave. I think it's actuall= y our current parser which is slightly broken. Not sure about the escaping part though. Cheers, Adam. --iVCmgExH7+hIHJ1A Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJV4bK8AAoJELZ22lNQBzHO4YIH/10VgNocjgQP1LDTDjW5c6M5 /QAfo/zwDSFDICzmaFmwpbb/4bnMfADh4n+qPZIkFM9UA8+SzMqDU3JgA/mgMBrf 66AiG8u2aaG9x13jPsvJl7sq+GCoGtTMLVjCDVAhAVkSxzLMU9sl29Su+NnzAAp/ t8Yg0/qa51FjjmYwwicqe5tIMVLrE/x9yu8/ttoPmY+xn4vMxzThN1Dn+XzVAyiI AYeHR41gRY24WvhP6rR2kCVYuCva5wN7aQ5cFOkPMY4SQlApbrma5kc7HwzkfUtt GQFPkeOM1/y/tSkDOdDo/pHQcdGwMXmCk9jx7gh7xM9u6uSwOMzakVWonb9PxX4= =wjfS -----END PGP SIGNATURE----- --iVCmgExH7+hIHJ1A-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from resqmta-ch2-03v.sys.comcast.net (resqmta-ch2-03v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:35]) by hurricane.the-brannons.com (Postfix) with ESMTPS id AA8A577BBD for ; Sat, 29 Aug 2015 07:33:58 -0700 (PDT) Received: from resomta-ch2-18v.sys.comcast.net ([69.252.207.114]) by resqmta-ch2-03v.sys.comcast.net with comcast id Aebp1r0052Udklx01ec9S2; Sat, 29 Aug 2015 14:36:09 +0000 Received: from eklhad ([IPv6:2601:405:4002:b0a:21e:4fff:fec2:a0f1]) by resomta-ch2-18v.sys.comcast.net with comcast id Aec91r00D0GArqr01ec9R1; Sat, 29 Aug 2015 14:36:09 +0000 To: Edbrowse-dev@lists.the-brannons.com From: Karl Dahlke Reply-to: Karl Dahlke References: <20150729060404.eklhad@comcast.net> <20150829132516.GD31434@toaster.adamthompson.me.uk> User-Agent: edbrowse/3.5.4.2+ Date: Sat, 29 Aug 2015 10:36:09 -0400 Message-ID: <20150729103609.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1440858969; bh=Ls7kF5iESHn11ZCylc909NysXdrQ5AzfD5+GAJTcWXc=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=L5IprayFtw2mdz46ar+YXCFLfvRzxA0d6fWfESWJFomt/3hMLSb4RixYynJea189U FaTbiwZn0mK+WkFqhPjxB6U33rrcoY9wnts0nRHP9y4XXi0QSbconi0h1WB4UGV0rC u2ywkeeCelHmaTl1kQS5Vq0DL6yZNgMpoPjoTf96yilzHO4qOrfMU+KbW8dYYYeUjo SYCyVfBzgZMb0J6E8BinCn1NW2oZSupG70BPq0yOV6D0J40qjhzoGwoFBF+UtN+1o8 vf1c8lyWAnn/fol0U6MazO/S09d2/DdbxLYMDZ84MRt98H3agBBhyWPw1Hqsy/2TjR 9iwgIr3xi2HaQ== Subject: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Aug 2015 14:33:58 -0000 > If we want to print tidy warnings for local files fair enough, > but lets make it a user's choice, We can come up with a better convention, similar to nojs and novs in the config file, I'm fine with that, we're just playing so to speak. But I'm not going to worry too much abount conventions until we know tidy5 is usable, which I'm not sure it is. You're talking about what the specs say, I'm talking about what's on the internet that we have to honor. I'm sure I could find document.write with tags in it if I look around. But ok, if you didn't like my last sample script then try this one. I know this is to spec. jf test hello world Turn off js, we don't need it here, set db6, browse, and see that < is turned into <. That would be a syntax error. Tidy is mucking withth the script. Karl Dahlke From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) by hurricane.the-brannons.com (Postfix) with ESMTPS id E10E977ABB for ; Sat, 29 Aug 2015 07:56:11 -0700 (PDT) Received: by wicne3 with SMTP id ne3so39908211wic.0 for ; Sat, 29 Aug 2015 07:58:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=zIyWP7K07ux2utQA99suDpukuPyj6t4TYUFDWXcEOJE=; b=z+ZG7EoywGVZokEVbKk7H649GN/p6koU4ev3EGJmM5vOv/kuM2KW/bAEqUwI6TkIZc Uy/MXmNcZU3lIY4gkl6u7uTf/RV8EF/P8FgYkecHPhz8R0GrBDcnNGTmJoh++qaHhDTW iGRQQYDp0GCwiTfEYHxCoJLO0RafAcPwg5QmLBBFjF6haViPV4N3f/a2Ae/7QZwIlZEF 7H8i0cCwyrA9wptHbHW9QJZfsy22A86xo5ai1SaEqDlvEI2XpL2tzjz2XmoRgvdaBq/r oNjDA/i+MFD7HjWg6M7nHNiYRdvKuE92QaeGMFsOz3ZtCqcWGFDpYTCZglQIKKoWOGzG PrNA== X-Received: by 10.180.23.132 with SMTP id m4mr9685639wif.89.1440860302654; Sat, 29 Aug 2015 07:58:22 -0700 (PDT) Received: from toaster.adamthompson.me.uk (toaster.adamthompson.me.uk. [2001:8b0:1142:9042::2]) by smtp.gmail.com with ESMTPSA id hn2sm13142204wjc.45.2015.08.29.07.58.21 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 29 Aug 2015 07:58:21 -0700 (PDT) Date: Sat, 29 Aug 2015 15:58:20 +0100 From: Adam Thompson To: Karl Dahlke Cc: Edbrowse-dev@lists.the-brannons.com Message-ID: <20150829145820.GE31434@toaster.adamthompson.me.uk> References: <20150729060404.eklhad@comcast.net> <20150829132516.GD31434@toaster.adamthompson.me.uk> <20150729103609.eklhad@comcast.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xA/XKXTdy9G3iaIz" Content-Disposition: inline In-Reply-To: <20150729103609.eklhad@comcast.net> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Aug 2015 14:56:12 -0000 --xA/XKXTdy9G3iaIz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Aug 29, 2015 at 10:36:09AM -0400, Karl Dahlke wrote: > > If we want to print tidy warnings for local files fair enough, > > but lets make it a user's choice, >=20 > We can come up with a better convention, > similar to nojs and novs in the config file, > I'm fine with that, we're just playing so to speak. > But I'm not going to worry too much abount conventions > until we know tidy5 is usable, which I'm not sure it is. >=20 > You're talking about what the specs say, I'm talking about > what's on the internet that we have to honor. > I'm sure I could find document.write with tags in it if I look around. No, this just doesn't work in anyone's conforming implementation. Apparently it's a difference between html and xhtml or something, I'm not entirely sure. > But ok, if you didn't like my last sample script then try this one. > I know this is to spec. >=20 > > > jf test > > hello world > > >=20 > Turn off js, we don't need it here, set db6, browse, > and see that < is turned into <. > That would be a syntax error. > Tidy is mucking withth the script. Haven't ran the edbrowse test yet, but downloaded,compiled and installed ti= dy5 and ran your script through their executable. The script came out perfectly fine (and yes I know tidy consumed the html because it added the generator field). Not sure what we're doing differentl= y. I wonder if it's just a printing thing or something. Cheers, Adam. --xA/XKXTdy9G3iaIz Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJV4ciLAAoJELZ22lNQBzHO67MH/2gZ1MubEGC8nHkZiXKdwE3J 0tvo4zab2kKPV/my6n6YJY6dVKOry/OgerNjy4XtFs5YDlrZsf2DryS52AM0fv+f iTw0Qki+OGizLNVQxUTTSNfqyiN/w0B3oWEDkKCNsR67QHPiMo/aVb73nVxKRT4f 7jFE+D2Jn+6o6P3iNceITqud4ydeYe7CJPTryKplYXop6jO2i8TssX5lhXUHZXSk /psdTrBHGlHFj81wUbw+3c/DqvCruxN3wcWL04/f8buyw8Lih0ANY23BB47WJcjO WF5u/wYcZG1x5eqCDJuWQiiZoss6Xza8OOLiV5roOQzg7c0OsXuJloBS7h8TYQc= =/BvU -----END PGP SIGNATURE----- --xA/XKXTdy9G3iaIz-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from resqmta-ch2-06v.sys.comcast.net (resqmta-ch2-06v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:38]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 176D177AA9 for ; Sat, 29 Aug 2015 09:03:39 -0700 (PDT) Received: from resomta-ch2-11v.sys.comcast.net ([69.252.207.107]) by resqmta-ch2-06v.sys.comcast.net with comcast id Ag5V1r00B2Ka2Q501g5r2G; Sat, 29 Aug 2015 16:05:51 +0000 Received: from eklhad ([IPv6:2601:405:4002:b0a:21e:4fff:fec2:a0f1]) by resomta-ch2-11v.sys.comcast.net with comcast id Ag5q1r00F0GArqr01g5qTc; Sat, 29 Aug 2015 16:05:50 +0000 To: Edbrowse-dev@lists.the-brannons.com From: Karl Dahlke Reply-to: Karl Dahlke References: <20150729060404.eklhad@comcast.net> <20150829145820.GE31434@toaster.adamthompson.me.uk> User-Agent: edbrowse/3.5.4.2+ Date: Sat, 29 Aug 2015 12:05:50 -0400 Message-ID: <20150729120550.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1440864351; bh=31zS7LFtBv5K03boSyY6Rl0EhOUgWh59W41vmGRE6WI=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=saplxV56X8j8E1PDUx26Itk11C8DO+MAowPPSVEgiwl6SA+D5fu3e3+U3QczBDyJk d1a0P+uksDUs+zE1XdNyNeDw0XjSeo+xNXS1bHckkar594gDQDJb0DB19Frt+Eg7b1 7x0Yp4h/pNAwZ6/nY9fa+crMhKfOe3iUl+bH+So8aem3ZSMddnRyHvJIAVUfzM90BC 8sjP0OXs5c3brATDW3gewby7C00GQQQuOhZZCiZZEbnCwxh0+cTA9WlJCurzNG39+M m46T1E2mPM0dkwubfHlBsa9C8okZjcj69XnhuGwhqXM0fpn137zWvj/iSsny8alfOF EdijmBv3lMLLw== Subject: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Aug 2015 16:03:40 -0000 > If we want to print tidy warnings for local files fair enough, > but lets make it a user's choice, Actually I'm kinda stupid here. I built these mashinations for local files or remote files, and exceptions through some mechanism, filename or config options etc, but really all I need to do is print the tidy errors at debugLevel 3 or above. None of our users are going to want to see them, except a rare few who write their own html, and they can get along with db3 to check their sourcefiles. So that's what I should do, but still not what I'm worried about. > I wonder if it's just a printing thing or something. Oh I hope it is, but I fear it's not. (This would be a great time for me to be wrong.) We aren't passing the contents of script.text to the js engine yet, but some day we will, when tidy replaces all my software, and on that day we will be passing if(3 < 4) instead of if(3 < 4) At least that's how it appears. Karl Dahlke From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out.smtp-auth.no-ip.com (smtp-auth.no-ip.com [8.23.224.61]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 6FFCD77ABB for ; Sat, 29 Aug 2015 18:12:52 -0700 (PDT) X-No-IP: carhart.net@noip-smtp X-Report-Spam-To: abuse@no-ip.com Received: from carhart.net (unknown [99.52.200.227]) (Authenticated sender: carhart.net@noip-smtp) by smtp-auth.no-ip.com (Postfix) with ESMTPA id 35795400D49 for ; Sat, 29 Aug 2015 18:15:04 -0700 (PDT) Received: from carhart.net (localhost [127.0.0.1]) by carhart.net (8.13.8/8.13.8) with ESMTP id t7U1F37l006301 for ; Sat, 29 Aug 2015 18:15:03 -0700 Received: from localhost (kevin@localhost) by carhart.net (8.13.8/8.13.8/Submit) with ESMTP id t7U1F3C8006298 for ; Sat, 29 Aug 2015 18:15:03 -0700 Date: Sat, 29 Aug 2015 18:15:03 -0700 (PDT) From: Kevin Carhart To: Edbrowse-dev@lists.the-brannons.com In-Reply-To: <20150729120550.eklhad@comcast.net> Message-ID: References: <20150729060404.eklhad@comcast.net> <20150829145820.GE31434@toaster.adamthompson.me.uk> <20150729120550.eklhad@comcast.net> User-Agent: Alpine 2.03 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Re: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 01:12:52 -0000 Hi Adam, and I'm glad that the recent round from yesterday til now is going well! Exciting stuff! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x230.google.com (mail-wi0-x230.google.com [IPv6:2a00:1450:400c:c05::230]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 6300477BBD for ; Sun, 30 Aug 2015 01:24:48 -0700 (PDT) Received: by wicne3 with SMTP id ne3so49261530wic.0 for ; Sun, 30 Aug 2015 01:27:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=bgFxikrB0koMqK7f3ZLen3TQm6hdO3JFGtytpevd4no=; b=jWAQJ3DFdIltGy9B6LKhlXMWmH+IScPj58XGa9aPI8k/0QJOsjDGttK4roju9QsChx NTA2/82muTd49toH71tW7oNumIeme3QNJqPyq0wIiM6TOCH8D5sBoLTLsL/Zjbf8lN4D f5coPuhjVB2/5GjxOHllbT8oQi1MDY1URbd/ccn41BmpMPZfUVXQkmJPsmigSgB8Vyto aakcoV0kclA7FQXeL1A0FUBpJFsHfZZkvm5LgZUgG8s2AL1rKDBeuhDxrRRl6nurbyJK CIP4tfo/ZkjnwaQ6Jxqi9Ezr0zsUVJYTdSAcIlUX+xP5Dr08mA1Itre5Sz77IK0EVkZL vf2A== X-Received: by 10.194.238.168 with SMTP id vl8mr21300286wjc.128.1440923220434; Sun, 30 Aug 2015 01:27:00 -0700 (PDT) Received: from toaster.adamthompson.me.uk (toaster.adamthompson.me.uk. [2001:8b0:1142:9042::2]) by smtp.gmail.com with ESMTPSA id c11sm12151022wib.1.2015.08.30.01.26.59 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 30 Aug 2015 01:26:59 -0700 (PDT) Date: Sun, 30 Aug 2015 09:26:51 +0100 From: Adam Thompson To: Karl Dahlke Cc: Edbrowse-dev@lists.the-brannons.com Message-ID: <20150830082651.GA17154@toaster.adamthompson.me.uk> References: <20150729060404.eklhad@comcast.net> <20150829145820.GE31434@toaster.adamthompson.me.uk> <20150729120550.eklhad@comcast.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="liOOAslEiF7prFVr" Content-Disposition: inline In-Reply-To: <20150729120550.eklhad@comcast.net> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 08:24:48 -0000 --liOOAslEiF7prFVr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Aug 29, 2015 at 12:05:50PM -0400, Karl Dahlke wrote: > > If we want to print tidy warnings for local files fair enough, > > but lets make it a user's choice, >=20 > Actually I'm kinda stupid here. > I built these mashinations for local files or remote files, > and exceptions through some mechanism, filename or config options etc, > but really all I need to do is print the tidy errors at debugLevel 3 or a= bove. > None of our users are going to want to see them, > except a rare few who write their own html, > and they can get along with db3 to check their sourcefiles. > So that's what I should do, but still not what I'm worried about. I agree that's the correct approach. > > I wonder if it's just a printing thing or something. >=20 > Oh I hope it is, but I fear it's not. > (This would be a great time for me to be wrong.) > We aren't passing the contents of script.text to the js engine yet, > but some day we will, when tidy replaces all my software, > and on that day we will be passing >=20 > if(3 < 4) >=20 > instead of >=20 > if(3 < 4) >=20 > At least that's how it appears. As I said, this doesn't tally with what I'm seeing from the actual tidy tool (the tidy 5 version) which uses the same library as we do, so I suspect we're doing something incorrectly somewhere. I'll have a look at that code to see what they're doing differently. --liOOAslEiF7prFVr Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJV4r5LAAoJELZ22lNQBzHOx2gIAM1vMqYjRGepPEQUOYoaPQjJ X4uvmOu//l/jLJYO+V2rQf8y0IDo3WftMB5O+Dl/Cshun5vikkd7RyOZ9KIp5C92 L1xe7DJwkZjZq13l5epjzJ0V7BFveyOiqQ1LLt7p9DuKBaEATQEqdxfWs7HPzwXn ndFsz8qboq9ljFhGwkFLJsu1XdvSRjA3DYDtCzUVOzAQmHIGWB3XkjY1bbGLna4i 9hursfyX0juKuT5UvhyBbYV/yplpESbQSOWGdzOGTAsCWK5el9IfW5mAJYL0WbsG RyXxG6IWVOlBHUISG2NUGSxSti5FFhlm0YlPpjtNc0eNJ9K2ZycLQBoaRSL1qkg= =2n7+ -----END PGP SIGNATURE----- --liOOAslEiF7prFVr-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 22ED877D0D for ; Sun, 30 Aug 2015 02:28:19 -0700 (PDT) Received: by wicne3 with SMTP id ne3so49996109wic.0 for ; Sun, 30 Aug 2015 02:30:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=9IRWnt/gffNRd7GzinJGb3wULtGqAP5QrVL+0kMH7xE=; b=LgiJ6kUsz2tB6plDmPSxmWdGA5apnUVEyXr0q+dHwmDBQfZ5OTXdwExpTjz8HIKuXh vhbjJlFVG8AxaSzgiNOM6HfKs3qfpdmcRTW7zMp+tYJywraTFrk4/ess0ffVoX0JFnMy 10WWaF3cz6D3ajiN5TmxmHS1pWMxT2xpCNuIws/47FPNKCT+TiarUsdv9brFiFkw6/KW s4VFXNHSZISKMcQvRWVono4OwoDx+T/sAsfos/bUZkVKwbqK7ujgBlbMRFUhCIw8gtmz zZ/TvJ18gd91DUQqP+hhhD+HwH1dQo/kdoXEzy/hB/EaNHZUYIBioPwfS0QYHKN/pkvA vkCQ== X-Received: by 10.181.12.40 with SMTP id en8mr2788203wid.75.1440927031499; Sun, 30 Aug 2015 02:30:31 -0700 (PDT) Received: from toaster.adamthompson.me.uk (toaster.adamthompson.me.uk. [2001:8b0:1142:9042::2]) by smtp.gmail.com with ESMTPSA id hn2sm16536536wjc.45.2015.08.30.02.30.30 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 30 Aug 2015 02:30:30 -0700 (PDT) Date: Sun, 30 Aug 2015 10:30:29 +0100 From: Adam Thompson To: Karl Dahlke Cc: Edbrowse-dev@lists.the-brannons.com Message-ID: <20150830093029.GB17154@toaster.adamthompson.me.uk> References: <20150729060404.eklhad@comcast.net> <20150829145820.GE31434@toaster.adamthompson.me.uk> <20150729120550.eklhad@comcast.net> <20150830082651.GA17154@toaster.adamthompson.me.uk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="bCsyhTFzCvuiizWE" Content-Disposition: inline In-Reply-To: <20150830082651.GA17154@toaster.adamthompson.me.uk> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 09:28:19 -0000 --bCsyhTFzCvuiizWE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Aug 30, 2015 at 09:26:51AM +0100, Adam Thompson wrote: > On Sat, Aug 29, 2015 at 12:05:50PM -0400, Karl Dahlke wrote: > > > If we want to print tidy warnings for local files fair enough, > > > but lets make it a user's choice, > >=20 > > Actually I'm kinda stupid here. > > I built these mashinations for local files or remote files, > > and exceptions through some mechanism, filename or config options etc, > > but really all I need to do is print the tidy errors at debugLevel 3 or= above. > > None of our users are going to want to see them, > > except a rare few who write their own html, > > and they can get along with db3 to check their sourcefiles. > > So that's what I should do, but still not what I'm worried about. >=20 > I agree that's the correct approach. Thanks for making this change. > > > I wonder if it's just a printing thing or something. > >=20 > > Oh I hope it is, but I fear it's not. > > (This would be a great time for me to be wrong.) > > We aren't passing the contents of script.text to the js engine yet, > > but some day we will, when tidy replaces all my software, > > and on that day we will be passing > >=20 > > if(3 < 4) > >=20 > > instead of > >=20 > > if(3 < 4) > >=20 > > At least that's how it appears. >=20 > As I said, this doesn't tally with what I'm seeing from the actual tidy t= ool > (the tidy 5 version) which uses the same library as we do, > so I suspect we're doing something incorrectly somewhere. > I'll have a look at that code to see what they're doing differently. It turns out what we wanted was tidyNodeGetValue which doesn't do escaping unlike tidyNodeGetText. I've made, tested and pushed this change. --bCsyhTFzCvuiizWE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJV4s00AAoJELZ22lNQBzHOuqQH/iNpFl0Z+kMeoxxplu9E7zt2 zSHD+A5/DG/vdEfMMdU6aUL/93qk0Zm8gWe1fHS5P6Fkwi0i0zyFS6Xm3W8wNM+K 3HD/doSyPrShbR603FA4vBYykpwkYlAQlhbMCzIA3a5JTgSloywO8PJv31uU0pE5 ltJ0nvFoNgTBbpSRepS9PGGe2eOOybLiUT/Re2iHohW4AiDy66vnDhyWLY3nj1Ah zGGADdjNBtncsV8YzK29ZOvsRowLBDXj+fr7qEIJvBB+y/+mAqUCdPMERyDqy7W9 7FDcOFIa2xH600O9NAeZj3Fe+ZyIBuAn4xgGfclxx8AkyQ5ek+zR7wVhznziB/g= =K8ck -----END PGP SIGNATURE----- --bCsyhTFzCvuiizWE-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from resqmta-ch2-07v.sys.comcast.net (resqmta-ch2-07v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:39]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 174FB77AA9 for ; Sun, 30 Aug 2015 02:47:36 -0700 (PDT) Received: from resomta-ch2-01v.sys.comcast.net ([69.252.207.97]) by resqmta-ch2-07v.sys.comcast.net with comcast id Axpp1r00226dK1R01xppsW; Sun, 30 Aug 2015 09:49:49 +0000 Received: from eklhad ([IPv6:2601:405:4002:b0a:21e:4fff:fec2:a0f1]) by resomta-ch2-01v.sys.comcast.net with comcast id Axpo1r0080GArqr01xpoUK; Sun, 30 Aug 2015 09:49:48 +0000 To: Edbrowse-dev@lists.the-brannons.com From: Karl Dahlke Reply-to: Karl Dahlke References: <20150729060404.eklhad@comcast.net> <20150830093029.GB17154@toaster.adamthompson.me.uk> User-Agent: edbrowse/3.5.4.2+ Date: Sun, 30 Aug 2015 05:49:48 -0400 Message-ID: <20150730054948.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1440928189; bh=hi2sQOfBrEyMzJt1KoG4mJonpolKdIbleJRyA3Vi2kU=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=SbI3IyIBUWKXPwI/kHBfvLmGoKyMqMl1pRvNXr/v7rTyt0qsSs+DYyjXcjySkRTYK k6GG4BteGR7Z9DtWiLf3Ws4lv3UcFeJDXLa7v6YfOzFR0vOwoQqG1BaACF3eZA3qa1 oIqshG4GbkGRhlS/e9GRQNpX1MeYWlQp20+P5S/EwLnOhshx1jeE2NioLwODFqVK9R u+EY78aibbMkpUbhy6Kj0DfXgAjuIRLX/CGjoXKfC1jRuffEREUAaDodD7PkyIsYAd /ahO2i1dJOs6/PXw8hCD13dj063JyS0F6GTYFixETXtRQk1ck6HgK1T3sQdNCtzRot CTscaMMo7Uh+A== Subject: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 09:47:36 -0000 > It turns out what we wanted was tidyNodeGetValue which doesn't do escaping Well will you look at that! Like I say, it's a great time to be wrong. So I guess we're back in the saddle again. Do keep an eye on us over here, and keep us pointed in the right direction. Thanks. Karl Dahlke From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com [IPv6:2a00:1450:400c:c05::22d]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 1DCD477AA9 for ; Sun, 30 Aug 2015 03:00:29 -0700 (PDT) Received: by wicne3 with SMTP id ne3so50368014wic.0 for ; Sun, 30 Aug 2015 03:02:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=WxInaXCawQXfI6cmTT+vXZsTYGu163+70x0r4FGHD/o=; b=iE3ve5Vadf+ECDydMWbddkq0e7LDqRk1QkqIHObpFvBFRC91NW1bG1Vjhj4WTRws+K 7cBb38SntNdr7CXdNCmzxyQNYWMVttm4zOxV96QklVpLZyMyoiM/3W0YXd3lhnbyBynj hyeUflurjJDLhCA1AfFaYpRUCO04xWZEqhu7nAn7bky3/XQgl9JNPhYDgPWnOFV5uLBw erTE2ZFmhqdCrOiSnlFdz3oGveDOF37/L3KoOzdcBuvtZypXcrFpWv8V+SFKSn+WLiCu uffbzY3unRRY0CDIHFviriAZVJLsrsnDDSfBkG7oY3lCUDfP9FQ33ZvOd2/IuAViaTw8 kQSg== X-Received: by 10.180.76.232 with SMTP id n8mr14291632wiw.72.1440928961648; Sun, 30 Aug 2015 03:02:41 -0700 (PDT) Received: from toaster.adamthompson.me.uk (toaster.adamthompson.me.uk. [2001:8b0:1142:9042::2]) by smtp.gmail.com with ESMTPSA id e8sm12476759wiz.0.2015.08.30.03.02.40 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 30 Aug 2015 03:02:40 -0700 (PDT) Date: Sun, 30 Aug 2015 11:02:39 +0100 From: Adam Thompson To: Karl Dahlke Cc: Edbrowse-dev@lists.the-brannons.com Message-ID: <20150830100239.GC17154@toaster.adamthompson.me.uk> References: <20150729060404.eklhad@comcast.net> <20150830093029.GB17154@toaster.adamthompson.me.uk> <20150730054948.eklhad@comcast.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zCKi3GIZzVBPywwA" Content-Disposition: inline In-Reply-To: <20150730054948.eklhad@comcast.net> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 10:00:29 -0000 --zCKi3GIZzVBPywwA Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Aug 30, 2015 at 05:49:48AM -0400, Karl Dahlke wrote: > > It turns out what we wanted was tidyNodeGetValue which doesn't do escap= ing >=20 > Well will you look at that! > Like I say, it's a great time to be wrong. > So I guess we're back in the saddle again. > Do keep an eye on us over here, and keep us pointed in the right directio= n. > Thanks. Will do. Any chance you could have a go at converting some of the parsing l= ogic today (I've got another day off work tomorrow as it's a public holiday over here so will be able to review)? I'd do it myself but would need to spend s= ome time getting my head around our current parser in order to unwind it. --zCKi3GIZzVBPywwA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJV4tS/AAoJELZ22lNQBzHONiEIAI8UR+VrGWVjduCDtZITk86L qWdPsgU33xmHwYw7S3U19RoPJzj4mUriB1RngzhVrwoI+mAG86scgQ6hgGHPlfDT tvaZ22K9/Kui0+YQChtYgUr2HfhdR3DVbmC9sOm7T6JHzU1mClcMynI0L3Lo2tdV A7BOrvXI4e4azlWrILF0J5ovApCOp2QEYseCWi7VliOmqyJXHXNEYXncYRWCJlMg dU/ASoLBJZmOI1Ups5DSPiShJW7uk+O5yaoBit1YJAp8ZE0G2INwtX9NXjOqw2De lcvHzqAMBCMutWch804fqw7aU9ovQLfKMCEwE55LgBpojFqbQSRsMKvCvgI5iQU= =bPDK -----END PGP SIGNATURE----- --zCKi3GIZzVBPywwA-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from resqmta-ch2-08v.sys.comcast.net (resqmta-ch2-08v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:40]) by hurricane.the-brannons.com (Postfix) with ESMTPS id CA61077BBD for ; Sun, 30 Aug 2015 03:28:58 -0700 (PDT) Received: from resomta-ch2-16v.sys.comcast.net ([69.252.207.112]) by resqmta-ch2-08v.sys.comcast.net with comcast id AyX21r0022S2Q5R01yXB7o; Sun, 30 Aug 2015 10:31:11 +0000 Received: from eklhad ([IPv6:2601:405:4002:b0a:21e:4fff:fec2:a0f1]) by resomta-ch2-16v.sys.comcast.net with comcast id AyXB1r0090GArqr01yXBii; Sun, 30 Aug 2015 10:31:11 +0000 To: Edbrowse-dev@lists.the-brannons.com From: Karl Dahlke Reply-to: Karl Dahlke References: <20150729060404.eklhad@comcast.net> <20150830100239.GC17154@toaster.adamthompson.me.uk> User-Agent: edbrowse/3.5.4.2+ Date: Sun, 30 Aug 2015 06:31:11 -0400 Message-ID: <20150730063111.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=nextpart-eb-711431 Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1440930671; bh=7DyOf6NN/nmCISSWpnbJyQnIYZ0IFcl5/hWBYYVFmh8=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=pZ4u5AqjvLn8jwktFJ5KkcRPKGfprt3Vsb81ex/xd95M45gBha0WVGHDU3jrP185h A6G4EK0ua69kZfzMRckYO+15YOXMgv5F/uw2DOl4MRnLv8sTE4seFR7OubsAL7aunN zQ8hqmtlD/tFnGGQdEwZTPjVJWaOsJRS2wy/X4g5B6eDE5LDlopn8LSTzVwc8vv24i mt9jpG+nAvxJKiU2+RRjPOmK9LW55uDjw6FxOwpE7smojhzzqWNKQSSf+no67otfMg 3eMJFeO7auJmlmsaH3VoSxRepGaOQpyBstByxL0s7U7kd6o6PjQKPXGtrdcmQ+dvXb D4vjviRpkNfRw== Subject: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 10:28:59 -0000 This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --nextpart-eb-711431 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > Any chance you could have a go at converting some of the parsing logic Wow - I'm good but not that good. It's a pretty big project. I don't want to move too slowly, having done almost nothing on edbrowse in the past 6 months, but I don't want to run recklessly fast either. Need to pass designs by you guys before coding etc. And there are still some big questions to answer, like is tidy5 the right path, or perhaps libhubbub, which could be part of a larger browser effort, larger than just parsing html. netsurf-browser.org I'm running another sanity check on tidy. This generates an error because & is not escaped, and yes it probably should be. It even converts © into the copyright symbol, now part of the url. So ok, maybe I did a bad test because I'm not following spec but the internet doesn't follow spec either, not all the time. Look at the raw html from www.sciam.com It contains these two lines, on the same home page.
  • Subscribe to All Access »
  • Subscribe to Print »
  • The first one has & escaped, the second one does not. So ok just wanted to make sure tidy is handling these two cases = properly, and it is. Happily, my parser also handles these cases properly. I must have run into this at some point. I'll continue testing. Assuming I uncover no serious problems, I think the next step is to enhance our edbrowse node, with enough = attributes to faithfully copy the information from a tidy node. We have some of the attributs but not enough. A blatent omission is a text string, because we never represented text nodes before. We'll need this, and child pointers, and a list of attribute value pairs, and other things. I'll post more on this later. Karl Dahlke --nextpart-eb-711431-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com [IPv6:2a00:1450:400c:c05::236]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 5DA5C77D0D for ; Sun, 30 Aug 2015 04:14:39 -0700 (PDT) Received: by wicne3 with SMTP id ne3so51244620wic.0 for ; Sun, 30 Aug 2015 04:16:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=9oUSWD1rI88EoUtGgm89tUHUVEo59UDzP9ELQk6xC9M=; b=Tc+NbIqtxr8g46sdBgbAwtUQ0rsViFX1kIwJ4x9HyLPqGhzx1A9oTexRrqelrpqcVe Ql0hS3ycsf4mdUsU8lh8gNPpx0V2rmctpbIuu8el5RIyXUE82ID8o/3HPkYBBSoxJj2A mi4SbVmhyqw//laLigW9FmU4PVsR5CfqHmjadtcn8fd2gZCl3qahB+YuWGg4oKu6IXhh sd9enuYEd6P4FBTrNYJ6QH21cK7So80WHVY0JUCnPvEG73XpPeRowQBp1NZQncG/drBW LNPqa2VNB58+WxnFcYVqFELu6zY7wSu1k2iaGzz1sSnHJr9j2E8nwrDHumtkjy4E2EB1 fPTw== X-Received: by 10.194.120.198 with SMTP id le6mr21147347wjb.133.1440933412008; Sun, 30 Aug 2015 04:16:52 -0700 (PDT) Received: from toaster.adamthompson.me.uk (toaster.adamthompson.me.uk. [2001:8b0:1142:9042::2]) by smtp.gmail.com with ESMTPSA id h6sm12738057wiy.3.2015.08.30.04.16.50 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 30 Aug 2015 04:16:50 -0700 (PDT) Date: Sun, 30 Aug 2015 12:16:49 +0100 From: Adam Thompson To: Karl Dahlke Cc: Edbrowse-dev@lists.the-brannons.com Message-ID: <20150830111649.GD17154@toaster.adamthompson.me.uk> References: <20150729060404.eklhad@comcast.net> <20150830100239.GC17154@toaster.adamthompson.me.uk> <20150730063111.eklhad@comcast.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ylS2wUBXLOxYXZFQ" Content-Disposition: inline In-Reply-To: <20150730063111.eklhad@comcast.net> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [Edbrowse-dev] tidy debug tree, and a js script X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 11:14:39 -0000 --ylS2wUBXLOxYXZFQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Aug 30, 2015 at 06:31:11AM -0400, Karl Dahlke wrote: > > Any chance you could have a go at converting some of the parsing logic >=20 > Wow - I'm good but not that good. > It's a pretty big project. > I don't want to move too slowly, having done almost nothing on edbrowse > in the past 6 months, but I don't want to run recklessly fast either. > Need to pass designs by you guys before coding etc. > And there are still some big questions to answer, like is tidy5 > the right path, or perhaps libhubbub, which could be part > of a larger browser effort, larger than just parsing html. > netsurf-browser.org I looked into libhubbub before going down the tidy5 route, but it seems to have been sucked into netsurf fully now so I gave up on it. > I'm running another sanity check on tidy. > This generates an error because & is not escaped, > and yes it probably should be. > I suspect the error here is that we're missing the closing quote in this ex= ample actually. > It even converts © into the copyright symbol, > now part of the url. Not sure about that, but this isn't a correct attribute anyways and I've ne= ver seen this on the internet. > So ok, maybe I did a bad test because I'm not following spec > but the internet doesn't follow spec either, not all the time. > Look at the raw html from www.sciam.com > It contains these two lines, on the same home page. >=20 >
  • Subscribe to All Access »
  • >
  • Subscribe to Print »
  • >=20 > The first one has & escaped, the second one does not. > So ok just wanted to make sure tidy is handling these two cases properly, > and it is. > Happily, my parser also handles these cases properly. > I must have run into this at some point. Agreed about html not following spec; that's why I wanted to get away from a home grown parser. > I'll continue testing. Assuming I uncover no serious problems, > I think the next step is to enhance our edbrowse node, with enough attrib= utes > to faithfully copy the information from a tidy node. > We have some of the attributs but not enough. > A blatent omission is a text string, > because we never represented text nodes before. > We'll need this, and child pointers, > and a list of attribute value pairs, and other things. > I'll post more on this later. Yeah, I think we'll need to revise our tag nodes signifficantly eventually, but as a stop gap measure I was thinking to simply run through the tidy tree and copy what we can into our existing structure. It's far from perfect but we need to rewrite the DOM anyway at some stage. That's going to be a really large project requiring a large set of changes = to how our tag system works. Thus I don't see the point in enhancing our exist= ing tags too much, though if it naturally heads towards a functional DOM representation that's good with me. At least this way we get more html support and probably get rid of some par= sing issues into the bargain. Cheers, Adam. --ylS2wUBXLOxYXZFQ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJV4uYhAAoJELZ22lNQBzHOFvAIAKqV/DrkQp6tHrTv8twmIMiJ Bw9srg2ahNKhEt6PrMf6LmjHMiLNi0WHqJIHNT80Z5fJeTG1O58/lfG0BJqHfWgH oCu9Q9yfyChWtQDx4izQpsQInQi5ZcOIF0WkluQRXoOEQ+rB6y0CgpTyziJ3dDFe UJOwup06Zw6DkQIfhqEoUnfVeMIbaxQ717rd8RzlFhcsR9i28dye/rkWpzWU1AF/ 4dYwFGZnmw/r/xIQAxLQltlCEwdHLxoibv3bjVqkkepHUsfydRPLLXLru4218bYj smVhAOwmfIq+2/h6t3ViZxf1FBCIg+DplWUTsC4uWy98tkoLa1BCvoQ0t7sP+VI= =UxNR -----END PGP SIGNATURE----- --ylS2wUBXLOxYXZFQ--