From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-x233.google.com (mail-wj0-x233.google.com [IPv6:2a00:1450:400c:c01::233]) by hurricane.the-brannons.com (Postfix) with ESMTPS id 3CD4379E0E for ; Thu, 22 Dec 2016 12:13:20 -0800 (PST) Received: by mail-wj0-x233.google.com with SMTP id c11so31903361wjx.3 for ; Thu, 22 Dec 2016 12:13:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=geoffair-info.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=e8aNQgX42XBZdIZOkyOa3gcnlqhK1nijrQuCcvY38tU=; b=0VJSMyB5pJ1m2oL4lDPRedVlIvKytosxbW1BbH1iaNgrKTx7A3vWvuYdp9SLWo3zsO 32gfyAvlz1zrRd/ayImZWc3YYD993NznQE5lvJHeaXZPPE5rZgnEZdnzwmPN9CsAX+lt Kku/fR9Fd1SS47WMyMpmDHoPLBdLjew5fogdu1ZXVhOHshbkroBcSA2lLVMiXC10UykX 3L0HqHNOSO7LF5M1wE3C8QLE/Xw45GQW2w0BSn2s7X41IZaILEOkMel2RyUmzHSnCbDl CZArP+YdH7CkE7kqX0dV6AEpx5vGy5qC61nnefkKmuLmtcGW2sbMBL90LxThSmcpSFHQ 1JJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=e8aNQgX42XBZdIZOkyOa3gcnlqhK1nijrQuCcvY38tU=; b=UB3wZG10K/Z9l7pU/+6LdmmHmGdB9cJBg1ykQ/tJCFVPoKzQ9j+h80reW1krk/SiD1 VJvmyUVZMvdVHar0u34utxveFoB4d7QIl5GAphHCYjyOkfsHYvUlFm4/E53/lqwfznyD fzIK+J2INy5DFDDiwVql0+b/Oh5BYvAoY/9BaNu1fTbLQ7xZz/mNStbQt+IZdWCWU9y4 /+JEon2wURzoSoZdyfUgc3kAthsu7fzija92li9ukjcwOPF5L7CL/6Z30dCfzut1xpcG 5Ig8OApCvqsKfhx7Ym4KdMp47yTpkJZvZl9PxLD8ikdGtdwhmlnzhlmdKYw/U4h+7DPK VUGw== X-Gm-Message-State: AIkVDXIoUJO6siv7Hoj7zaz9HiJifPX+3b4N0bzC98C3ST+Riq2RR2z3S7dgIH/i8TLNzQ== X-Received: by 10.194.248.233 with SMTP id yp9mr10584039wjc.228.1482437614872; Thu, 22 Dec 2016 12:13:34 -0800 (PST) Received: from ?IPv6:2a01:cb04:4ba:c500:5443:5f21:5f01:c19f? (2a01cb0404bac50054435f215f01c19f.ipv6.abo.wanadoo.fr. [2a01:cb04:4ba:c500:5443:5f21:5f01:c19f]) by smtp.googlemail.com with ESMTPSA id e3sm2178025wjm.12.2016.12.22.12.13.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Dec 2016 12:13:34 -0800 (PST) To: Karl Dahlke , edbrowse-dev@lists.the-brannons.com References: <20161120141458.eklhad@comcast.net> <4449751f-0582-add8-0cb0-74ff1d69c97f@geoffair.info> <20161122133544.eklhad@comcast.net> From: Geoff McLane Message-ID: Date: Thu, 22 Dec 2016 21:13:32 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161122133544.eklhad@comcast.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Edbrowse-dev] X-BeenThere: edbrowse-dev@lists.the-brannons.com X-Mailman-Version: 2.1.23 Precedence: list List-Id: Edbrowse Development List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2016 20:13:20 -0000 Hi Karl, > In an ideal world, LOL! Well we all know that does not exist! Tidy does leave the form open, waiting, as it should, for a close form, but then it hits a tr open table element, and reports - line 5 column 1 - Warning: missing close form before tr It is at this point that it *must* close the form... and carries on parsing the table row.. etc... And that is why tidy emits an error when it does eventually find a close form... I too have had the thought - does this not tell tidy that the earlier implicit form close it added was not right - but what can it do about it at that stage? > postmuck with the tree Yes, I hear you! That is *not* fun, and as you point out in fixing one page, you can break so many others... > Using libtidy You know, for a long time I have wondered why you do not write your own html parser! Not that I particularly want you to abandon libtidy... your participation has helped solve some libtidy problems... and so do hope you continue... But like any std html browser, IE, firefox, chrome, who-ever, you are not really interested in how well a document is formed... browsers can just skip over many problems... If necessary, maybe levering code from text-based web browsers, like Lynx, but in my experimentation with some of these, they too can get very hairy... It is just that once you have the html text in a buffer, it basically consists of looking for `<` and the `>`, with not too many exceptions... I have done this, with reasonable success, in several perl scripts I have written... as I am sure you probably have... like I remember in your first perl version... But I understand, this is a long, LONG way around... quite an amount of new work initially... But libtidy is always going to give you problems when it runs into invalid html, and its efforts to make it valid... Just some thoughts... Sorry, can not seem to help more... Regards, Geoff.