From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p27CvtAS026768
	for <caml-list@sympa-roc.inria.fr>; Mon, 7 Mar 2011 13:57:55 +0100
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AhgFAGJldE3AbSoIe2dsb2JhbACEKZQjjX0VAQEWIgQhrSKQMg2BGoNFdgQ
X-IronPort-AV: E=Sophos;i="4.62,277,1297033200"; 
   d="scan'208";a="92833377"
Received: from einhorn.in-berlin.de ([192.109.42.8])
  by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 07 Mar 2011 13:57:50 +0100
X-Envelope-From: oliver@first.in-berlin.de
X-Envelope-To: <caml-list@inria.fr>
Received: from first (e178006068.adsl.alicedsl.de [85.178.6.68])
	(authenticated bits=0)
	by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id p27CvnmD029588
	for <caml-list@inria.fr>; Mon, 7 Mar 2011 13:57:49 +0100
Received: by first (Postfix, from userid 1000)
	id 2CE3C15408DE; Mon,  7 Mar 2011 13:57:49 +0100 (CET)
Date: Mon, 7 Mar 2011 13:57:49 +0100
From: oliver <oliver@first.in-berlin.de>
To: caml-list@inria.fr
Message-ID: <20110307125748.GA4977@siouxsie>
References: <20110306225242.GA9087@siouxsie>
 <1299500875.30035.31.camel@thinkpad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <1299500875.30035.31.camel@thinkpad>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8
Subject: Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?

On Mon, Mar 07, 2011 at 01:27:55PM +0100, Gerd Stolpmann wrote:
> Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
> > Hello,
> > 
> > tried around using the simple-dtd argument
> > for Nethtme.parse.
> > 
> > It changes the behaviour compared to
> > the default behaviour, but I could not find out
> > how this works.
> > 
> > Someone here who can explain me this
> > argument and describe, how it can be used?
> 
> Maybe the HTML specification would be a good reference here:
> http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
> most HTML elements are either an inline element, a block element, or
> both ("flow" element). The grammar of HTML is described in terms of
> these classes. For instance, a P tag (paragraph) is a block element and
> contains block elements whereas B (bold) is an inline element and
> contains inline elements. From this follows that you cannot put a P
> inside a B: <B><P>something</P></B> is illegal.
> 
> The parser needs this information to resolve such input, i.e. do
> something with bad HTML. As HTML allows tag minimization (many end tags
> can be omitted), the parser can read this as: <B></B><P>something</P>
> (and the </B> in the input is ignored).
> 
> If all start and all end tags are written out, changing the
> simplified_dtd does not make any difference.
> 
> There is no normative text that says how to read bad HTML. Because of
> this, it is - to a large degree - an interpretation of HTML what you put
> into simplified_dtd.
> 
> > The description IMHO is not sufficient to explain
> > this feature.
> 
> I'd say your formal knowledge about HTML is insufficient.
[...]

If formal HTML spec is sufficient to know the behaviour of the module,
there would no need to have the dtd-argument, which seems, follwoinjg your
explanations, to change the behavior in a way that it does NOT follow
the formal specifications.



> It is
> impossible to explain all the basics of HTML in the scope of an mli.

Do you mean the basics of html spec or the many different ways,
bad html is written?!


Ciao,
   Oliver