From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from resqmta-ch2-05v.sys.comcast.net (resqmta-ch2-05v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:37]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 406B979E0B for ; Tue, 20 Dec 2016 11:14:47 -0800 (PST) Received: from resomta-ch2-04v.sys.comcast.net ([69.252.207.100]) by resqmta-ch2-05v.sys.comcast.net with SMTP id JPrhcSJp1GIG7JPsJcTVBn; Tue, 20 Dec 2016 19:14:59 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20161114; t=1482261299; bh=XyjEe0DAtkPOFJZye6K59iF2V+66Cgu72lucj2h8Vts=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=QTui8UwZ4F3n8jAw2GWmpMMuR4p3dQNva9o2eXLUcDaN+CPB6VkJZlEOAuv2dSOkP 9SCEnO0GnALuq1KGRzfnnsT7wV5s6yLlqETNN4kiGCcnPtd/VY4q/A1i8/89gxpn89 R9InFM6nfDL6dVZ6mywn1ozkkdeiycdcCed/8FD9blp5J4ly5n/hUKXFiD5ycMA5i7 YyvQJcUkGcw+mGYj/sCh2xPg0ZZgNhibC/i1DDUKeB75WGFPVHN2Zp5xCHJIPS83Yl EqEmwFLyhsBiTJC1Zkp9CAImyfOPg3PTZmD011WVEA7jqlNMqaif+PFEsTQBzWBKhi hweHUGQtFBTaA== Received: from unknown ([IPv6:2601:408:c301:784d:21e:4fff:fec2:a0f1]) by resomta-ch2-04v.sys.comcast.net with SMTP id JPsIcBWmFdKISJPsIc0hNW; Tue, 20 Dec 2016 19:14:58 +0000 To: Edbrowse-dev@lists.the-brannons.com From: Karl Dahlke Reply-to: Karl Dahlke User-Agent: edbrowse/3.6.2+ Date: Tue, 20 Dec 2016 14:14:58 -0500 Message-ID: <20161120141458.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=nextpart-eb-577393 Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4wfDYv0QkU9FtQq6wZM0xuOvbfyQ0H4/CvRSb+QVaF1IQSNWQt9RwvD83N5swFLK9VYDLysLPjxjrg88ibiQvUXNIKCsVBuI9vvmtPmHF6EVL2CWyilZtZ k9/hnVxPAO28xb+u76AlRx2nq9r6hFy9MyreduBWY2p/IjrGG331O1yS Subject: [Edbrowse-dev] X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.23 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2016 19:14:47 -0000 This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --nextpart-eb-577393 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Please look at www.eklhad.net/nascar.html This is a stripped down version of an unsubscribe page that doesn't = work, which is a shame cause I'd love to unsubscribe from nascar! The problem might be tidy. Browse it with js off and db5.
seems to throw it completely off the tracks. The form is closed as soon as comes along, and all those input = items aren't part of the form, including the last submit button, so you = just can't do a damn thing. The tidy team might say "It's bad html syntax" and that may be true, = but we still have to parse it correctly. Karl Dahlke --nextpart-eb-577393-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (unknown [IPv6:2602:4b:a4a9:e100::63a:26e7]) by hurricane.the-brannons.com (Postfix) with ESMTPSA id A183479E0E for ; Wed, 21 Dec 2016 06:01:12 -0800 (PST) From: Chris Brannon To: Edbrowse-dev@lists.the-brannons.com References: <20161120141458.eklhad@comcast.net> Date: Wed, 21 Dec 2016 06:01:02 -0800 In-Reply-To: <20161120141458.eklhad@comcast.net> (Karl Dahlke's message of "Tue, 20 Dec 2016 14:14:58 -0500") Message-ID: <87inqd1iw1.fsf@the-brannons.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Edbrowse-dev]
X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.23 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Dec 2016 14:01:12 -0000 Karl Dahlke writes: > Please look at www.eklhad.net/nascar.html > This is a stripped down version of an unsubscribe page that doesn't work, I'm waiting a bit to see if Geoff has any input on this. I don't know whether he still follows this list. If I don't hear anything in the next few days, I'll file an issue against the tidy5 repository. As far as I can tell, it is not valid HTML, but maybe we can get some kind of workaround at parse time. -- Chris From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-x235.google.com (mail-wj0-x235.google.com [IPv6:2a00:1450:400c:c01::235]) by hurricane.the-brannons.com (Postfix) with ESMTPS id D32E879E13 for ; Wed, 21 Dec 2016 09:03:48 -0800 (PST) Received: by mail-wj0-x235.google.com with SMTP id xy5so207424442wjc.0 for ; Wed, 21 Dec 2016 09:04:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=geoffair-info.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=FhUYV1fg9FUCIX+4jsw8/IxjGL+4X5cfkd58VlAtCuk=; b=Pq9ByL0+Z2MDKCYCbPl01PYLH+d+A2Xr2sGFse73ZrsMNIIMWBeMIiVE4jQd8tZdbG h7mYy5V94QKiV9D4t81bw8+hAQoVJJRsCmbLVq0WT6jxlNbLzzObIerpFPp6W/NWPj5z vxErTgGhB+cHzSOQ5i/D/v+fNpHp9YdrBI6UqX7pIS7fG4343ZFytznKmsQvFd292fHC Qg776F7qKrq9DY1SbzksrBWpA0AzEVZ2LdUz3167vtAwGEKieM8xLpuKbdMzJChuIJ04 5Wm8qbrEzAyDi5uHWwGs0x1oFpDvKmjv+NnmJP57r10q6TYz5mcRRMpuDUd1piDnnGPg wNkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=FhUYV1fg9FUCIX+4jsw8/IxjGL+4X5cfkd58VlAtCuk=; b=pMSva9eD+uP84QmGPy3r/iajEnE7uRHXlD2XhJtbDt2Y1g4YCdN9mNLhRmAzrscIy/ /kRAdpYpmkn+nS2p2fgXy9Hgs+8xnRlrOE4IuvKdlUeN0dvzmxG1bZdr16Pf0uwgtQBQ 1UhWFWo6Tsj6L9T1oUmDkv/UIDZs5vWbaBkYDHt+wj2Ji0VNTLQoTV9z7kpE1hSYsJFR 8izrzSp9bENPqodL08PFvFczQKPXol2fqgG+tE1rLdMV+dVMtL48DKFcZqOXJ2EtXRjX U6lGpQ3J6MFnn0XZCNqjYQyWFno3NaBe6LTSbfkB5DzKXk+PviEonWmkWSFVgm4mpWok 7r2A== X-Gm-Message-State: AIkVDXJl7JZRhqwB/+k0Uiu0HVU2fSbMkXm4qlmvKf8F7pCTtOJwS9oeTa4zjxlDb4rBzw== X-Received: by 10.194.66.37 with SMTP id c5mr5281773wjt.138.1482339839343; Wed, 21 Dec 2016 09:03:59 -0800 (PST) Received: from ?IPv6:2a01:cb04:4ba:c500:a1d3:128c:9cda:9dca? (2a01cb0404bac500a1d3128c9cda9dca.ipv6.abo.wanadoo.fr. [2a01:cb04:4ba:c500:a1d3:128c:9cda:9dca]) by smtp.googlemail.com with ESMTPSA id w197sm28009541wmd.11.2016.12.21.09.03.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Dec 2016 09:03:57 -0800 (PST) To: edbrowse-dev@lists.the-brannons.com, Karl Dahlke , Chris Brannon References: <20161120141458.eklhad@comcast.net> <87inqd1iw1.fsf@the-brannons.com> From: Geoff McLane Message-ID: <4449751f-0582-add8-0cb0-74ff1d69c97f@geoffair.info> Date: Wed, 21 Dec 2016 18:03:56 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <87inqd1iw1.fsf@the-brannons.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Edbrowse-dev]
X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.23 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Dec 2016 17:03:49 -0000 Hi Karl, Chris, > Please look at www.eklhad.net/nascar.html Yes, still casually follow the list, but do not always find time to run a test... unless you poke me, like now ;=)) And yes, tidy will see that as invalid html! With an error even, so no output unless forced, but IIRC you do add force-output... But even if you do that, tidy will close the form, move the script out of the table, and thus the submit line no longer has an associated form action... In reading around, like here - http://stackoverflow.com/questions/5967564/form-inside-a-table where it says - "You can have an entire table inside a form. You can have a form inside a table cell. You cannot have part of a table inside a form." But I suppose none of this helps you have a valid 'submit' button... Yes, you could file a tidy issue, but not quite sure what you would expect tidy to do in such a case? But open to ideas... Regards, Geoff. PS: Been so long, seems I have even forgotten the email and pwd I used for the list, so will add direct cc to you both... Maybe you could remind me... On 21/12/16 15:01, Chris Brannon wrote: > Karl Dahlke writes: > >> Please look at www.eklhad.net/nascar.html >> This is a stripped down version of an unsubscribe page that doesn't work, > I'm waiting a bit to see if Geoff has any input on this. I don't know > whether he still follows this list. If I don't hear anything in the > next few days, I'll file an issue against the tidy5 repository. > As far as I can tell, it is not valid HTML, but maybe we can get some > kind of workaround at parse time. > > -- Chris > _______________________________________________ > Edbrowse-dev mailing list > Edbrowse-dev@lists.the-brannons.com > http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from resqmta-ch2-10v.sys.comcast.net (resqmta-ch2-10v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:42]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 924CD79BF6 for ; Thu, 22 Dec 2016 10:35:34 -0800 (PST) Received: from resomta-ch2-14v.sys.comcast.net ([69.252.207.110]) by resqmta-ch2-10v.sys.comcast.net with SMTP id K8CscVu5TrC25K8DRcmf7Z; Thu, 22 Dec 2016 18:35:45 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20161114; t=1482431745; bh=PSmZaytX9qqzHva6v7IGj4C0n03ZmrKwu5b7+ZSyhNo=; h=Received:Received:To:From:Reply-to:Subject:Date:Message-ID: Mime-Version:Content-Type; b=iXYfTXkFXnVwfRBT7vdgHKSOuvTtNgHV52387YqqkYZnYeKpCQiHR8kmHlxd4cEtN gnIrv922FYqzR1v4t4jFRl7LOTj+sWevO9WF6LVfE08fS/iqM7ZzYmCerZMw76sIPW rHmEqA13z4QAsP4Qa6eFXFrW48dxdvJc188UmcPLLsTbHO2WA4BM6s1YGGpi8tWTT5 0hdJsuBlcJu1tduOGhB70nZSHKnEfmnFpU/ZxHRqrzxnJsvugB1NFnMJjoGAkv3w+2 hAdkVKawy65lu9zayObThGCJmfyJ0rLmwWz1e8Bji7ifI94ZaW/k9fmrSjFZ4fOpFV Rj1P09nIEcMvQ== Received: from unknown ([IPv6:2601:408:c301:784d:21e:4fff:fec2:a0f1]) by resomta-ch2-14v.sys.comcast.net with SMTP id K8DQcQoHk2qkOK8DQcKebh; Thu, 22 Dec 2016 18:35:45 +0000 To: ubuntu@geoffair.info, edbrowse-dev@lists.the-brannons.com From: Karl Dahlke Reply-to: Karl Dahlke References: <20161120141458.eklhad@comcast.net> <4449751f-0582-add8-0cb0-74ff1d69c97f@geoffair.info> User-Agent: edbrowse/3.6.2+ Date: Thu, 22 Dec 2016 13:35:44 -0500 Message-ID: <20161122133544.eklhad@comcast.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4wfJkITHpV+QM3epyPqTUFZKwApDK9c3Se1WI/icusA2P0VuMDCMvd7j+0P3e81XOXEO8LIjm30TQMG1+ClpBNFt1/2gi+1DSUneAIqi0CfRjSYJxXZYIs l4OCiA4KSxh5W9ctoP3/0EWMAWBPjXhcsrlcSctU9JFGYbnRzyDa1n8gM7e92VmrHkG3DxNqwyZfoQ== Subject: [Edbrowse-dev]
X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.23 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2016 18:35:34 -0000 > And yes, tidy will see that as invalid html! And that's fine. > tidy will close the form, move the script out of the table, In an ideal world, from our point of view, it would still leave the form open. There is a later on down the page. If tidy just can't do that, I could think about postprocessing the tree, moving the nodes to the right of the form down to children of the form, or some such, but every time I've tried to postmuck with the tree I've fixed one web page and broken 8 others. So I'm not fond of going down that path. Karl Dahlke From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-x233.google.com (mail-wj0-x233.google.com [IPv6:2a00:1450:400c:c01::233]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 3CD4379E0E for ; Thu, 22 Dec 2016 12:13:20 -0800 (PST) Received: by mail-wj0-x233.google.com with SMTP id c11so31903361wjx.3 for ; Thu, 22 Dec 2016 12:13:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=geoffair-info.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=e8aNQgX42XBZdIZOkyOa3gcnlqhK1nijrQuCcvY38tU=; b=0VJSMyB5pJ1m2oL4lDPRedVlIvKytosxbW1BbH1iaNgrKTx7A3vWvuYdp9SLWo3zsO 32gfyAvlz1zrRd/ayImZWc3YYD993NznQE5lvJHeaXZPPE5rZgnEZdnzwmPN9CsAX+lt Kku/fR9Fd1SS47WMyMpmDHoPLBdLjew5fogdu1ZXVhOHshbkroBcSA2lLVMiXC10UykX 3L0HqHNOSO7LF5M1wE3C8QLE/Xw45GQW2w0BSn2s7X41IZaILEOkMel2RyUmzHSnCbDl CZArP+YdH7CkE7kqX0dV6AEpx5vGy5qC61nnefkKmuLmtcGW2sbMBL90LxThSmcpSFHQ 1JJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=e8aNQgX42XBZdIZOkyOa3gcnlqhK1nijrQuCcvY38tU=; b=UB3wZG10K/Z9l7pU/+6LdmmHmGdB9cJBg1ykQ/tJCFVPoKzQ9j+h80reW1krk/SiD1 VJvmyUVZMvdVHar0u34utxveFoB4d7QIl5GAphHCYjyOkfsHYvUlFm4/E53/lqwfznyD fzIK+J2INy5DFDDiwVql0+b/Oh5BYvAoY/9BaNu1fTbLQ7xZz/mNStbQt+IZdWCWU9y4 /+JEon2wURzoSoZdyfUgc3kAthsu7fzija92li9ukjcwOPF5L7CL/6Z30dCfzut1xpcG 5Ig8OApCvqsKfhx7Ym4KdMp47yTpkJZvZl9PxLD8ikdGtdwhmlnzhlmdKYw/U4h+7DPK VUGw== X-Gm-Message-State: AIkVDXIoUJO6siv7Hoj7zaz9HiJifPX+3b4N0bzC98C3ST+Riq2RR2z3S7dgIH/i8TLNzQ== X-Received: by 10.194.248.233 with SMTP id yp9mr10584039wjc.228.1482437614872; Thu, 22 Dec 2016 12:13:34 -0800 (PST) Received: from ?IPv6:2a01:cb04:4ba:c500:5443:5f21:5f01:c19f? (2a01cb0404bac50054435f215f01c19f.ipv6.abo.wanadoo.fr. [2a01:cb04:4ba:c500:5443:5f21:5f01:c19f]) by smtp.googlemail.com with ESMTPSA id e3sm2178025wjm.12.2016.12.22.12.13.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Dec 2016 12:13:34 -0800 (PST) To: Karl Dahlke , edbrowse-dev@lists.the-brannons.com References: <20161120141458.eklhad@comcast.net> <4449751f-0582-add8-0cb0-74ff1d69c97f@geoffair.info> <20161122133544.eklhad@comcast.net> From: Geoff McLane Message-ID: Date: Thu, 22 Dec 2016 21:13:32 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161122133544.eklhad@comcast.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Edbrowse-dev]
X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.23 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2016 20:13:20 -0000 Hi Karl, > In an ideal world, LOL! Well we all know that does not exist! Tidy does leave the form open, waiting, as it should, for a close form, but then it hits a tr open table element, and reports - line 5 column 1 - Warning: missing close form before tr It is at this point that it *must* close the form... and carries on parsing the table row.. etc... And that is why tidy emits an error when it does eventually find a close form... I too have had the thought - does this not tell tidy that the earlier implicit form close it added was not right - but what can it do about it at that stage? > postmuck with the tree Yes, I hear you! That is *not* fun, and as you point out in fixing one page, you can break so many others... > Using libtidy You know, for a long time I have wondered why you do not write your own html parser! Not that I particularly want you to abandon libtidy... your participation has helped solve some libtidy problems... and so do hope you continue... But like any std html browser, IE, firefox, chrome, who-ever, you are not really interested in how well a document is formed... browsers can just skip over many problems... If necessary, maybe levering code from text-based web browsers, like Lynx, but in my experimentation with some of these, they too can get very hairy... It is just that once you have the html text in a buffer, it basically consists of looking for `<` and the `>`, with not too many exceptions... I have done this, with reasonable success, in several perl scripts I have written... as I am sure you probably have... like I remember in your first perl version... But I understand, this is a long, LONG way around... quite an amount of new work initially... But libtidy is always going to give you problems when it runs into invalid html, and its efforts to make it valid... Just some thoughts... Sorry, can not seem to help more... Regards, Geoff. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-x242.google.com (mail-lf0-x242.google.com [IPv6:2a00:1450:4010:c07::242]) by hurricane.the-brannons.com (Postfix) with ESMTPS id E0D7378C68 for ; Sun, 25 Dec 2016 04:54:13 -0800 (PST) Received: by mail-lf0-x242.google.com with SMTP id t196so6824762lff.3 for ; Sun, 25 Dec 2016 04:54:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=4jUYpPM7v4/DMm52eaFVYad6Mh26MQr5VMlbQPr0Qj8=; b=GITSB8j24d9+5B181jyE3GgDwpvqgXcR/2xRBWWZhCjnxDpliQ9TFSgTbSgkacfq8e uSgxYm3lh8JYgo77ylEjqvLhwZACgtOcdvAWjKJa9VOu+kdcJa+uQZQVh70PllaGQSWK GW8JFvEEFfGXTyWxYPH+wmVW3bd6Sv7Ox0IGWE5400Xc9nHn77gcDX+DvMiz3XM8Oj12 4dpfQGCDpZZ4h0yGnJut/q6zEimABZIuIkkA7IPCVDcubfGueDgbFpy+a8XELYogdssx ZArquBonW969KfzEpuUC+b+JIrPFOnGQnKwtoDIJRZeSkxcex2Im4erAW5Opd/KqTvFq uXSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=4jUYpPM7v4/DMm52eaFVYad6Mh26MQr5VMlbQPr0Qj8=; b=TTRALXetWvXNkp/kQZLfY4ND0C8n1jZX0icO3MiBjiZWycbiaYOHlTvxN3ppf2klvE Mz3nrq17W47ctEzbZwiPtitErjqT9ocUwxvdCathJcIna/r/mP9YVgOi97wO8UXGGF4I pZGazN+48gTdChGModa/uj222ihuXAvVDNH0vsPnAaWT4n+lq9jTvloJrMmOeqXzybqC pCdHpX+vpoI+TGEEJqdBUQ/aqcKOdrGhqE1TfgaUJHTlY0H8Et7vd0jfY7DZJdHCL4e+ KKwEe7FcG5B8uaHFt7ttdg1+hzIhkZHxVIOnjGTfczePIAR8R6HdTV60xgWdXh4mB+6D I0+A== X-Gm-Message-State: AIkVDXLwIQ2bC5riwFaTXbErSRIUx2d8802LJEwn8Xg4jg/YLJn7OuuMeYH3w7T4AVYH6A== X-Received: by 10.46.69.6 with SMTP id s6mr8787675lja.42.1482670472223; Sun, 25 Dec 2016 04:54:32 -0800 (PST) Received: from odin (odin.sdf-eu.org. [178.63.35.194]) by smtp.gmail.com with ESMTPSA id 66sm210166lfy.42.2016.12.25.04.54.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 25 Dec 2016 04:54:31 -0800 (PST) Date: Sun, 25 Dec 2016 12:53:59 +0000 From: Adam Thompson To: Geoff McLane Cc: Karl Dahlke , edbrowse-dev@lists.the-brannons.com Message-ID: <20161225125359.GA16190@odin> References: <20161120141458.eklhad@comcast.net> <4449751f-0582-add8-0cb0-74ff1d69c97f@geoffair.info> <20161122133544.eklhad@comcast.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="SLDf9lqlvOQaIe6s" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [Edbrowse-dev]
X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.23 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Dec 2016 12:54:14 -0000 --SLDf9lqlvOQaIe6s Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 22, 2016 at 09:13:32PM +0100, Geoff McLane wrote: > > In an ideal world, >=20 > LOL! Well we all know that does not exist! Yep that's certainly true. > Tidy does leave the form open, waiting, as it > should, for a close form, but then it hits > a tr open table element, and reports - >=20 > line 5 column 1 - Warning: missing close form > before tr >=20 > It is at this point that it *must* close the > form... and carries on parsing the table > row.. etc... >=20 > And that is why tidy emits an error when it > does eventually find a close form... >=20 > I too have had the thought - does this not > tell tidy that the earlier implicit form > close it added was not right - but what can > it do about it at that stage? >=20 > > postmuck with the tree >=20 > Yes, I hear you! That is *not* fun, and as you > point out in fixing one page, you can break so > many others... Agreed. The only way I can think of around this would be for tidy to keep track of any missing close tags and then "fix" its tree once it finds the closing tag. This'd be messy though and fairly difficult to do well, but w= ould allow the forced output mode to produce complete forms etc. That being sai= d I'm not sure how many pages that'd break... probably many. > > Using libtidy >=20 > You know, for a long time I have wondered why > you do not write your own html parser! We had one for quite a while but it got harder to maintain as new elements were supported and then html5 happened. > Not that I particularly want you to abandon > libtidy... your participation has helped solve > some libtidy problems... and so do hope you > continue... >=20 > But like any std html browser, IE, firefox, chrome, > who-ever, you are not really interested in how > well a document is formed... browsers can just skip > over many problems... True, but tidy can repare most of them which is very useful. It's also A full validating html parser which, although causing some problems with in= valid pages, gives us support for a lot of html which'd otherwise take quite a b= it of work and maintenance. > If necessary, maybe levering code from text-based > web browsers, like Lynx, but in my experimentation > with some of these, they too can get very hairy... Yes, and adding support for dynamic page elements only makes things worse in that regard. In addition, just skipping over problems means one then needs= to work around them somehow. This may take the form of ignoring them, but mos= t of the time, particularly with js, some sort of special casing would be requir= ed. This is why reparing things (see my above comment) is so useful I think. > It is just that once you have the html text in a > buffer, it basically consists of looking for > `<` and the `>`, with not too many exceptions... >=20 > I have done this, with reasonable success, in several > perl scripts I have written... as I am sure you > probably have... like I remember in your first perl > version... >=20 > But I understand, this is a long, LONG way around... > quite an amount of new work initially... >=20 > But libtidy is always going to give you problems > when it runs into invalid html, and its efforts > to make it valid... No more problems imho than we'd experience in getting a valid node tree from this kind of thing. This, actually, isn't as bad as I've seen since the fo= rm is actually closed. I wonder if, in our case, we could detect from the tidy o= utput that there is actually a closing tag somewhere and then attempt to post-process as Karl suggested (may be print a warning and then have a comm= and or option to disable this for pages where it breaks)? Any thoughts? Cheers, Adam. --SLDf9lqlvOQaIe6s Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJYX8FnAAoJED6sZNk+oYF/UkYP/1TQ3QCBYQWrCv/fDPgBtg7Z vcFcvH81O0Yqlu4J2Q4V9Gq/nkQnIPg98AKjZW1ElKEBls9Z5gG7BOKEnlAhPEes Tk7nj+1w5N9dZ+zoATx5kl0ehHIj3ovTp1np8WX26YpYCc7vqzNxOToXJSbDdy7l qm9YoA34G9l7W4pA19higiLyNnmKXPy0GB+7GqkPzWhayeYOXPbx3eLBGfiqDxqh 2wOBvsnml3aGhTcyZYGaeoGJf2lW1ihsWZrBNxDgQrHpdI8IgOVwHtVcJ09epOox 2wTkl7dMbt3rHbPROqA1NlWtE75M+buh/u5iHMm0U+esMEP3Lcaz3tq4b7u15YgW eLRAfszRYVpiCr8ZNTlhjsKHQXjgvLq0MdimxLQtaKiGptiLHmUzXFl1gEpOGiWk 1xQyUdhuYEpkLIL3+XCHin9eW+tbybUqSkU4Ju7PcsoOJcKRjwj1+UG7iXKoAb8l yMkdYRXcA5cUUYVCVxK1neJjbuQ9KGgcFIT/pwHyrxXmxbgyzSxQMXPUDXH13LVn vrIVs/TFcUhvwn3fSxWkopAEeFT6ufxeFXK5IO8WuP14WOvaLik6t6Mnmu+J17kw 8LQkcYGASiwoixhEz/KsRDdNMzfN+406Znu8fe52WoCoQtgQsR8uzi/kdGmA+512 gRW6J69ZJgDhlP5tPiIe =nq1F -----END PGP SIGNATURE----- --SLDf9lqlvOQaIe6s--