From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p27CS3bK025608 for ; Mon, 7 Mar 2011 13:28:03 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AoICAB9edE3U4xEIkGdsb2JhbACEKaIgFQEBAQEJCQwHEQMirS6QMgKBJYNFdgSIBYdu X-IronPort-AV: E=Sophos;i="4.62,276,1297033200"; d="scan'208";a="89355485" Received: from moutng.kundenserver.de ([212.227.17.8]) by mail4-smtp-sop.national.inria.fr with ESMTP; 07 Mar 2011 13:27:57 +0100 Received: from office1.lan.sumadev.de (dslb-188-097-066-025.pools.arcor-ip.net [188.97.66.25]) by mrelayeu.kundenserver.de (node=mreu1) with ESMTP (Nemesis) id 0LkDhW-1QYN8A46R8-00bWne; Mon, 07 Mar 2011 13:27:57 +0100 Received: from [192.168.5.106] (dslb-188-097-066-025.pools.arcor-ip.net [188.97.66.25]) by office1.lan.sumadev.de (Postfix) with ESMTPA id A3AE15F701; Mon, 7 Mar 2011 13:27:56 +0100 (CET) From: Gerd Stolpmann To: oliver Cc: caml-list@inria.fr In-Reply-To: <20110306225242.GA9087@siouxsie> References: <20110306225242.GA9087@siouxsie> Content-Type: text/plain; charset="UTF-8" Date: Mon, 07 Mar 2011 13:27:55 +0100 Message-ID: <1299500875.30035.31.camel@thinkpad> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:8qqdnfPLb3ZFE/3sErNTOe8khdTKu5xPoYRi0bjuOvR wAB5u4Gxeq3cjNEo9zkdbeTihMDsAHWUvP5AeCRgQE3E/hBtVE BPmpyiXLOSINuAeRz/s+Osvjf56vtFNbEoMgKOUxwDyAYdu5r4 IKN1I8rqB3Q1z/4l3pvF9g9xeoc832GvRj2WrcMUkpIK2e/xOp k/NAalFy72JYZQhyLU5BQ== Subject: Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work? Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver: > Hello, > > tried around using the simple-dtd argument > for Nethtme.parse. > > It changes the behaviour compared to > the default behaviour, but I could not find out > how this works. > > Someone here who can explain me this > argument and describe, how it can be used? Maybe the HTML specification would be a good reference here: http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that most HTML elements are either an inline element, a block element, or both ("flow" element). The grammar of HTML is described in terms of these classes. For instance, a P tag (paragraph) is a block element and contains block elements whereas B (bold) is an inline element and contains inline elements. From this follows that you cannot put a P inside a B:

something

is illegal. The parser needs this information to resolve such input, i.e. do something with bad HTML. As HTML allows tag minimization (many end tags can be omitted), the parser can read this as:

something

(and the in the input is ignored). If all start and all end tags are written out, changing the simplified_dtd does not make any difference. There is no normative text that says how to read bad HTML. Because of this, it is - to a large degree - an interpretation of HTML what you put into simplified_dtd. > The description IMHO is not sufficient to explain > this feature. I'd say your formal knowledge about HTML is insufficient. It is impossible to explain all the basics of HTML in the scope of an mli. Gerd > I created a simplified dtd and used it as > is mentioned in the manual. But changing the > Arguments of element-class and model constraint > did not brought any results that make sense to me. > Usint that argument jsut creates a different behaviour than > using no such arg, but more is not clear to me. > > An explanation or a pointer to explanational docs would be fine. > > Ciao, > Oliver > -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------