From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p27CvtAS026768 for ; Mon, 7 Mar 2011 13:57:55 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhgFAGJldE3AbSoIe2dsb2JhbACEKZQjjX0VAQEWIgQhrSKQMg2BGoNFdgQ X-IronPort-AV: E=Sophos;i="4.62,277,1297033200"; d="scan'208";a="92833377" Received: from einhorn.in-berlin.de ([192.109.42.8]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 07 Mar 2011 13:57:50 +0100 X-Envelope-From: oliver@first.in-berlin.de X-Envelope-To: Received: from first (e178006068.adsl.alicedsl.de [85.178.6.68]) (authenticated bits=0) by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id p27CvnmD029588 for ; Mon, 7 Mar 2011 13:57:49 +0100 Received: by first (Postfix, from userid 1000) id 2CE3C15408DE; Mon, 7 Mar 2011 13:57:49 +0100 (CET) Date: Mon, 7 Mar 2011 13:57:49 +0100 From: oliver To: caml-list@inria.fr Message-ID: <20110307125748.GA4977@siouxsie> References: <20110306225242.GA9087@siouxsie> <1299500875.30035.31.camel@thinkpad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1299500875.30035.31.camel@thinkpad> User-Agent: Mutt/1.5.20 (2009-06-14) X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8 Subject: Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work? On Mon, Mar 07, 2011 at 01:27:55PM +0100, Gerd Stolpmann wrote: > Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver: > > Hello, > > > > tried around using the simple-dtd argument > > for Nethtme.parse. > > > > It changes the behaviour compared to > > the default behaviour, but I could not find out > > how this works. > > > > Someone here who can explain me this > > argument and describe, how it can be used? > > Maybe the HTML specification would be a good reference here: > http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that > most HTML elements are either an inline element, a block element, or > both ("flow" element). The grammar of HTML is described in terms of > these classes. For instance, a P tag (paragraph) is a block element and > contains block elements whereas B (bold) is an inline element and > contains inline elements. From this follows that you cannot put a P > inside a B:

something

is illegal. > > The parser needs this information to resolve such input, i.e. do > something with bad HTML. As HTML allows tag minimization (many end tags > can be omitted), the parser can read this as:

something

> (and the in the input is ignored). > > If all start and all end tags are written out, changing the > simplified_dtd does not make any difference. > > There is no normative text that says how to read bad HTML. Because of > this, it is - to a large degree - an interpretation of HTML what you put > into simplified_dtd. > > > The description IMHO is not sufficient to explain > > this feature. > > I'd say your formal knowledge about HTML is insufficient. [...] If formal HTML spec is sufficient to know the behaviour of the module, there would no need to have the dtd-argument, which seems, follwoinjg your explanations, to change the behavior in a way that it does NOT follow the formal specifications. > It is > impossible to explain all the basics of HTML in the scope of an mli. Do you mean the basics of html spec or the many different ways, bad html is written?! Ciao, Oliver