[Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
@ 2011-03-06 22:52 oliver
  2011-03-07 12:27 ` Gerd Stolpmann
  0 siblings, 1 reply; 10+ messages in thread
From: oliver @ 2011-03-06 22:52 UTC (permalink / raw)
  To: caml-list

Hello,

tried around using the simple-dtd argument
for Nethtme.parse.

It changes the behaviour compared to
the default behaviour, but I could not find out
how this works.

Someone here who can explain me this
argument and describe, how it can be used?

The description IMHO is not sufficient to explain
this feature.

I created a simplified dtd and used it as
is mentioned in the manual. But changing the
Arguments of element-class and model constraint
did not brought any results that make sense to me.
Usint that argument jsut creates a different behaviour than
using no such arg, but more is not clear to me.

  An explanation or a pointer to explanational docs would be fine.

Ciao,
  Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
  2011-03-06 22:52 [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work? oliver
@ 2011-03-07 12:27 ` Gerd Stolpmann
  2011-03-07 12:57   ` oliver
  2011-03-07 15:40   ` Yoann Padioleau
  0 siblings, 2 replies; 10+ messages in thread
From: Gerd Stolpmann @ 2011-03-07 12:27 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
> Hello,
> 
> tried around using the simple-dtd argument
> for Nethtme.parse.
> 
> It changes the behaviour compared to
> the default behaviour, but I could not find out
> how this works.
> 
> Someone here who can explain me this
> argument and describe, how it can be used?

Maybe the HTML specification would be a good reference here:
http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
most HTML elements are either an inline element, a block element, or
both ("flow" element). The grammar of HTML is described in terms of
these classes. For instance, a P tag (paragraph) is a block element and
contains block elements whereas B (bold) is an inline element and
contains inline elements. From this follows that you cannot put a P
inside a B: <B><P>something</P></B> is illegal.

The parser needs this information to resolve such input, i.e. do
something with bad HTML. As HTML allows tag minimization (many end tags
can be omitted), the parser can read this as: <B></B><P>something</P>
(and the </B> in the input is ignored).

If all start and all end tags are written out, changing the
simplified_dtd does not make any difference.

There is no normative text that says how to read bad HTML. Because of
this, it is - to a large degree - an interpretation of HTML what you put
into simplified_dtd.

> The description IMHO is not sufficient to explain
> this feature.

I'd say your formal knowledge about HTML is insufficient. It is
impossible to explain all the basics of HTML in the scope of an mli.

Gerd

> I created a simplified dtd and used it as
> is mentioned in the manual. But changing the
> Arguments of element-class and model constraint
> did not brought any results that make sense to me.
> Usint that argument jsut creates a different behaviour than
> using no such arg, but more is not clear to me.
> 
>   An explanation or a pointer to explanational docs would be fine.
> 
> Ciao,
>   Oliver
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
  2011-03-07 12:27 ` Gerd Stolpmann
@ 2011-03-07 12:57   ` oliver
  2011-03-07 13:40     ` Gerd Stolpmann
  2011-03-07 15:40   ` Yoann Padioleau
  1 sibling, 1 reply; 10+ messages in thread
From: oliver @ 2011-03-07 12:57 UTC (permalink / raw)
  To: caml-list

On Mon, Mar 07, 2011 at 01:27:55PM +0100, Gerd Stolpmann wrote:
> Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
> > Hello,
> > 
> > tried around using the simple-dtd argument
> > for Nethtme.parse.
> > 
> > It changes the behaviour compared to
> > the default behaviour, but I could not find out
> > how this works.
> > 
> > Someone here who can explain me this
> > argument and describe, how it can be used?
> 
> Maybe the HTML specification would be a good reference here:
> http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
> most HTML elements are either an inline element, a block element, or
> both ("flow" element). The grammar of HTML is described in terms of
> these classes. For instance, a P tag (paragraph) is a block element and
> contains block elements whereas B (bold) is an inline element and
> contains inline elements. From this follows that you cannot put a P
> inside a B: <B><P>something</P></B> is illegal.
> 
> The parser needs this information to resolve such input, i.e. do
> something with bad HTML. As HTML allows tag minimization (many end tags
> can be omitted), the parser can read this as: <B></B><P>something</P>
> (and the </B> in the input is ignored).
> 
> If all start and all end tags are written out, changing the
> simplified_dtd does not make any difference.
> 
> There is no normative text that says how to read bad HTML. Because of
> this, it is - to a large degree - an interpretation of HTML what you put
> into simplified_dtd.
> 
> > The description IMHO is not sufficient to explain
> > this feature.
> 
> I'd say your formal knowledge about HTML is insufficient.
[...]

If formal HTML spec is sufficient to know the behaviour of the module,
there would no need to have the dtd-argument, which seems, follwoinjg your
explanations, to change the behavior in a way that it does NOT follow
the formal specifications.



> It is
> impossible to explain all the basics of HTML in the scope of an mli.

Do you mean the basics of html spec or the many different ways,
bad html is written?!


Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
  2011-03-07 12:57   ` oliver
@ 2011-03-07 13:40     ` Gerd Stolpmann
  2011-03-07 14:44       ` oliver
  0 siblings, 1 reply; 10+ messages in thread
From: Gerd Stolpmann @ 2011-03-07 13:40 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

Am Montag, den 07.03.2011, 13:57 +0100 schrieb oliver:
> On Mon, Mar 07, 2011 at 01:27:55PM +0100, Gerd Stolpmann wrote:
> > Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
> > > Hello,
> > > 
> > > tried around using the simple-dtd argument
> > > for Nethtme.parse.
> > > 
> > > It changes the behaviour compared to
> > > the default behaviour, but I could not find out
> > > how this works.
> > > 
> > > Someone here who can explain me this
> > > argument and describe, how it can be used?
> > 
> > Maybe the HTML specification would be a good reference here:
> > http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
> > most HTML elements are either an inline element, a block element, or
> > both ("flow" element). The grammar of HTML is described in terms of
> > these classes. For instance, a P tag (paragraph) is a block element and
> > contains block elements whereas B (bold) is an inline element and
> > contains inline elements. From this follows that you cannot put a P
> > inside a B: <B><P>something</P></B> is illegal.
> > 
> > The parser needs this information to resolve such input, i.e. do
> > something with bad HTML. As HTML allows tag minimization (many end tags
> > can be omitted), the parser can read this as: <B></B><P>something</P>
> > (and the </B> in the input is ignored).
> > 
> > If all start and all end tags are written out, changing the
> > simplified_dtd does not make any difference.
> > 
> > There is no normative text that says how to read bad HTML. Because of
> > this, it is - to a large degree - an interpretation of HTML what you put
> > into simplified_dtd.
> > 
> > > The description IMHO is not sufficient to explain
> > > this feature.
> > 
> > I'd say your formal knowledge about HTML is insufficient.
> [...]
> 
> If formal HTML spec is sufficient to know the behaviour of the module,
> there would no need to have the dtd-argument, which seems, follwoinjg your
> explanations, to change the behavior in a way that it does NOT follow
> the formal specifications.

There is no standard regarding that (except that HTML is also SGML, and
there are some rules for that in SGML). You could also reject bad HTML.
But this has not become common practice (unlike for XML, for instance).

So, it depends on your HTML documents how you want to fix bad HTML.
That's the reason why you can configure it.

> > It is
> > impossible to explain all the basics of HTML in the scope of an mli.
> 
> Do you mean the basics of html spec or the many different ways,
> bad html is written?!

This is connected to some degree - if you define a spec you also define
ways how to violate it. For example, the spec defines for each element
whether the start tag, the end tag, or both can be omitted. But what to
do if this is not done correctly?

The formal basics would here be the DTD (document type definition). If
you know what a DTD is, you also know the problem of omitting tags. If I
included all that in the mli, it would be 1000 lines long, and would
contain that much information that it would be unclear what the relevant
part is. In short - I don't want to write books in mli's. The user needs
background information, but the mli is not the right place where to give
it.

The specific problem with HTML is that everybody knows "something" about
it, but most knowledge is second-hand. However, HTML is a systematic
definition (with syntax and semantics), and everybody who knows that
will also be able to read my mli.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
  2011-03-07 13:40     ` Gerd Stolpmann
@ 2011-03-07 14:44       ` oliver
  2011-03-07 14:53         ` oliver
  2011-03-07 15:14         ` Gerd Stolpmann
  0 siblings, 2 replies; 10+ messages in thread
From: oliver @ 2011-03-07 14:44 UTC (permalink / raw)
  To: caml-list

On Mon, Mar 07, 2011 at 02:40:37PM +0100, Gerd Stolpmann wrote:
> Am Montag, den 07.03.2011, 13:57 +0100 schrieb oliver:
> > On Mon, Mar 07, 2011 at 01:27:55PM +0100, Gerd Stolpmann wrote:
> > > Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
> > > > Hello,
> > > > 
> > > > tried around using the simple-dtd argument
> > > > for Nethtme.parse.
> > > > 
> > > > It changes the behaviour compared to
> > > > the default behaviour, but I could not find out
> > > > how this works.
> > > > 
> > > > Someone here who can explain me this
> > > > argument and describe, how it can be used?
> > > 
> > > Maybe the HTML specification would be a good reference here:
> > > http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
> > > most HTML elements are either an inline element, a block element, or
> > > both ("flow" element). The grammar of HTML is described in terms of
> > > these classes. For instance, a P tag (paragraph) is a block element and
> > > contains block elements whereas B (bold) is an inline element and
> > > contains inline elements. From this follows that you cannot put a P
> > > inside a B: <B><P>something</P></B> is illegal.
> > > 
> > > The parser needs this information to resolve such input, i.e. do
> > > something with bad HTML. As HTML allows tag minimization (many end tags
> > > can be omitted), the parser can read this as: <B></B><P>something</P>
> > > (and the </B> in the input is ignored).
> > > 
> > > If all start and all end tags are written out, changing the
> > > simplified_dtd does not make any difference.
> > > 
> > > There is no normative text that says how to read bad HTML. Because of
> > > this, it is - to a large degree - an interpretation of HTML what you put
> > > into simplified_dtd.
> > > 
> > > > The description IMHO is not sufficient to explain
> > > > this feature.
> > > 
> > > I'd say your formal knowledge about HTML is insufficient.
> > [...]
> > 
> > If formal HTML spec is sufficient to know the behaviour of the module,
> > there would no need to have the dtd-argument, which seems, follwoinjg your
> > explanations, to change the behavior in a way that it does NOT follow
> > the formal specifications.
> 
> There is no standard regarding that (except that HTML is also SGML, and
> there are some rules for that in SGML). You could also reject bad HTML.
> But this has not become common practice (unlike for XML, for instance).
> 
> So, it depends on your HTML documents how you want to fix bad HTML.
> That's the reason why you can configure it.
[...]

But it's not mentioned how the dtd-Argument works.

Does it change the behaviour only for those tags that
are mentioned in the dtd-argument?
What about the other args? Will they stay as before (default)?

Or will using the dtd-arg change the parsing to some kind of
very relaxed fallback, and I add constraints only to the mentioned
tags?

So does the dtd-arg widen or narrow the default dtd or just replace the
default settings? What about tags that are not included in the dtd-arg?

I doubt that behaviour of your module can be found in the HTML-spec.

If it's so obvious behaviour, I wonder why nobody else could answer
the question (not here and not in irc).


> 
> > > It is
> > > impossible to explain all the basics of HTML in the scope of an mli.
> > 
> > Do you mean the basics of html spec or the many different ways,
> > bad html is written?!
> 
> This is connected to some degree - if you define a spec you also define
> ways how to violate it. For example, the spec defines for each element
> whether the start tag, the end tag, or both can be omitted. But what to
> do if this is not done correctly?

I could write my own parser, but prefer to use modules that help me.
For that the docs should explain what the arguments do.

I know that there is a DTD for HTML and that most html is not confirming to it.

But how your parser works and how that dtd-arg must be used is not clear to me.



> 
> The formal basics would here be the DTD (document type definition). If
> you know what a DTD is, you also know the problem of omitting tags. If I
> included all that in the mli, it would be 1000 lines long, and would
> contain that much information that it would be unclear what the relevant
> part is. In short - I don't want to write books in mli's. The user needs
> background information, but the mli is not the right place where to give
> it.

But the docs could explain the argument behaviour.


> 
> The specific problem with HTML is that everybody knows "something" about
> it, but most knowledge is second-hand. However, HTML is a systematic
> definition (with syntax and semantics), and everybody who knows that
> will also be able to read my mli.

Aha, that_s why nobody answered my question.

Maybe it's not that obvious.

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
  2011-03-07 14:44       ` oliver
@ 2011-03-07 14:53         ` oliver
  2011-03-07 15:14         ` Gerd Stolpmann
  1 sibling, 0 replies; 10+ messages in thread
From: oliver @ 2011-03-07 14:53 UTC (permalink / raw)
  To: caml-list

On Mon, Mar 07, 2011 at 03:44:15PM +0100, oliver wrote:
> On Mon, Mar 07, 2011 at 02:40:37PM +0100, Gerd Stolpmann wrote:
> > Am Montag, den 07.03.2011, 13:57 +0100 schrieb oliver:
> > > On Mon, Mar 07, 2011 at 01:27:55PM +0100, Gerd Stolpmann wrote:
> > > > Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
> > > > > Hello,
> > > > > 
> > > > > tried around using the simple-dtd argument
> > > > > for Nethtme.parse.
> > > > > 
> > > > > It changes the behaviour compared to
> > > > > the default behaviour, but I could not find out
> > > > > how this works.
> > > > > 
> > > > > Someone here who can explain me this
> > > > > argument and describe, how it can be used?
> > > > 
> > > > Maybe the HTML specification would be a good reference here:
> > > > http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
> > > > most HTML elements are either an inline element, a block element, or
> > > > both ("flow" element). The grammar of HTML is described in terms of
> > > > these classes. For instance, a P tag (paragraph) is a block element and
> > > > contains block elements whereas B (bold) is an inline element and
> > > > contains inline elements. From this follows that you cannot put a P
> > > > inside a B: <B><P>something</P></B> is illegal.
> > > > 
> > > > The parser needs this information to resolve such input, i.e. do
> > > > something with bad HTML. As HTML allows tag minimization (many end tags
> > > > can be omitted), the parser can read this as: <B></B><P>something</P>
> > > > (and the </B> in the input is ignored).
> > > > 
> > > > If all start and all end tags are written out, changing the
> > > > simplified_dtd does not make any difference.
> > > > 
> > > > There is no normative text that says how to read bad HTML. Because of
> > > > this, it is - to a large degree - an interpretation of HTML what you put
> > > > into simplified_dtd.
> > > > 
> > > > > The description IMHO is not sufficient to explain
> > > > > this feature.
> > > > 
> > > > I'd say your formal knowledge about HTML is insufficient.
> > > [...]
> > > 
> > > If formal HTML spec is sufficient to know the behaviour of the module,
> > > there would no need to have the dtd-argument, which seems, follwoinjg your
> > > explanations, to change the behavior in a way that it does NOT follow
> > > the formal specifications.
> > 
> > There is no standard regarding that (except that HTML is also SGML, and
> > there are some rules for that in SGML). You could also reject bad HTML.
> > But this has not become common practice (unlike for XML, for instance).
> > 
> > So, it depends on your HTML documents how you want to fix bad HTML.
> > That's the reason why you can configure it.
> [...]
> 
> But it's not mentioned how the dtd-Argument works.
[...]


For example, if you use the empty string "" as a tag,
it changes the parsing behaviour, even I doubt there is a tag
that has "" as name.

The same problem occurs with any phantasy-tag.

If I change non existing tags, why does the module parse correct html
different when using such a dtd-arg, compared to not using the dtd-arg?

I doubt I can find the answer in the HTML-spec.

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
  2011-03-07 14:44       ` oliver
  2011-03-07 14:53         ` oliver
@ 2011-03-07 15:14         ` Gerd Stolpmann
  2011-03-07 20:18           ` oliver
  1 sibling, 1 reply; 10+ messages in thread
From: Gerd Stolpmann @ 2011-03-07 15:14 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

Am Montag, den 07.03.2011, 15:44 +0100 schrieb oliver:
> On Mon, Mar 07, 2011 at 02:40:37PM +0100, Gerd Stolpmann wrote:
> > Am Montag, den 07.03.2011, 13:57 +0100 schrieb oliver:
> > > On Mon, Mar 07, 2011 at 01:27:55PM +0100, Gerd Stolpmann wrote:
> > > > Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
> > > > > Hello,
> > > > > 
> > > > > tried around using the simple-dtd argument
> > > > > for Nethtme.parse.
> > > > > 
> > > > > It changes the behaviour compared to
> > > > > the default behaviour, but I could not find out
> > > > > how this works.
> > > > > 
> > > > > Someone here who can explain me this
> > > > > argument and describe, how it can be used?
> > > > 
> > > > Maybe the HTML specification would be a good reference here:
> > > > http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
> > > > most HTML elements are either an inline element, a block element, or
> > > > both ("flow" element). The grammar of HTML is described in terms of
> > > > these classes. For instance, a P tag (paragraph) is a block element and
> > > > contains block elements whereas B (bold) is an inline element and
> > > > contains inline elements. From this follows that you cannot put a P
> > > > inside a B: <B><P>something</P></B> is illegal.
> > > > 
> > > > The parser needs this information to resolve such input, i.e. do
> > > > something with bad HTML. As HTML allows tag minimization (many end tags
> > > > can be omitted), the parser can read this as: <B></B><P>something</P>
> > > > (and the </B> in the input is ignored).
> > > > 
> > > > If all start and all end tags are written out, changing the
> > > > simplified_dtd does not make any difference.
> > > > 
> > > > There is no normative text that says how to read bad HTML. Because of
> > > > this, it is - to a large degree - an interpretation of HTML what you put
> > > > into simplified_dtd.
> > > > 
> > > > > The description IMHO is not sufficient to explain
> > > > > this feature.
> > > > 
> > > > I'd say your formal knowledge about HTML is insufficient.
> > > [...]
> > > 
> > > If formal HTML spec is sufficient to know the behaviour of the module,
> > > there would no need to have the dtd-argument, which seems, follwoinjg your
> > > explanations, to change the behavior in a way that it does NOT follow
> > > the formal specifications.
> > 
> > There is no standard regarding that (except that HTML is also SGML, and
> > there are some rules for that in SGML). You could also reject bad HTML.
> > But this has not become common practice (unlike for XML, for instance).
> > 
> > So, it depends on your HTML documents how you want to fix bad HTML.
> > That's the reason why you can configure it.
> [...]
> 
> But it's not mentioned how the dtd-Argument works.
> 
> Does it change the behaviour only for those tags that
> are mentioned in the dtd-argument?
> What about the other args? Will they stay as before (default)?

I think this is pretty clear: if you set the dtd arg, you pass a
completely new dtd in, overriding any default. This is how optional
arguments work in Ocaml.

So, if you only want to change something, take the provided html40_dtd
or relaxed_html40_dtd values, and apply a change to them.

> Or will using the dtd-arg change the parsing to some kind of
> very relaxed fallback, and I add constraints only to the mentioned
> tags?
> 
> So does the dtd-arg widen or narrow the default dtd or just replace the
> default settings? What about tags that are not included in the dtd-arg?
> 
> I doubt that behaviour of your module can be found in the HTML-spec.

No, this is Ocaml.

Btw, you could have easily answered that yourself by looking at the
source. I mean just for the case that you do not see the obvious.

Gerd

> If it's so obvious behaviour, I wonder why nobody else could answer
> the question (not here and not in irc).
> 
> 
> > 
> > > > It is
> > > > impossible to explain all the basics of HTML in the scope of an mli.
> > > 
> > > Do you mean the basics of html spec or the many different ways,
> > > bad html is written?!
> > 
> > This is connected to some degree - if you define a spec you also define
> > ways how to violate it. For example, the spec defines for each element
> > whether the start tag, the end tag, or both can be omitted. But what to
> > do if this is not done correctly?
> 
> I could write my own parser, but prefer to use modules that help me.
> For that the docs should explain what the arguments do.
> 
> I know that there is a DTD for HTML and that most html is not confirming to it.
> 
> But how your parser works and how that dtd-arg must be used is not clear to me.
> 
> 
> 
> > 
> > The formal basics would here be the DTD (document type definition). If
> > you know what a DTD is, you also know the problem of omitting tags. If I
> > included all that in the mli, it would be 1000 lines long, and would
> > contain that much information that it would be unclear what the relevant
> > part is. In short - I don't want to write books in mli's. The user needs
> > background information, but the mli is not the right place where to give
> > it.
> 
> But the docs could explain the argument behaviour.
> 
> 
> > 
> > The specific problem with HTML is that everybody knows "something" about
> > it, but most knowledge is second-hand. However, HTML is a systematic
> > definition (with syntax and semantics), and everybody who knows that
> > will also be able to read my mli.
> 
> Aha, that_s why nobody answered my question.
> 
> Maybe it's not that obvious.
> 
> Ciao,
>    Oliver
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
  2011-03-07 12:27 ` Gerd Stolpmann
  2011-03-07 12:57   ` oliver
@ 2011-03-07 15:40   ` Yoann Padioleau
  2011-03-07 16:24     ` Gerd Stolpmann
  1 sibling, 1 reply; 10+ messages in thread
From: Yoann Padioleau @ 2011-03-07 15:40 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: oliver, caml-list


On Mar 7, 2011, at 4:27 AM, Gerd Stolpmann wrote:

> Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
>> Hello,
>> 
>> tried around using the simple-dtd argument
>> for Nethtme.parse.
>> 
>> It changes the behaviour compared to
>> the default behaviour, but I could not find out
>> how this works.
>> 
>> Someone here who can explain me this
>> argument and describe, how it can be used?
> 
> Maybe the HTML specification would be a good reference here:
> http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
> most HTML elements are either an inline element, a block element, or
> both ("flow" element). The grammar of HTML is described in terms of
> these classes. For instance, a P tag (paragraph) is a block element and
> contains block elements whereas B (bold) is an inline element and
> contains inline elements. From this follows that you cannot put a P
> inside a B: <B><P>something</P></B> is illegal.
> 
> The parser needs this information to resolve such input, i.e. do
> something with bad HTML. As HTML allows tag minimization (many end tags
> can be omitted), the parser can read this as: <B></B><P>something</P>
> (and the </B> in the input is ignored).
> 
> If all start and all end tags are written out, changing the
> simplified_dtd does not make any difference.
> 
> There is no normative text that says how to read bad HTML. Because of
> this, it is - to a large degree - an interpretation of HTML what you put
> into simplified_dtd.
> 
>> The description IMHO is not sufficient to explain
>> this feature.
> 
> I'd say your formal knowledge about HTML is insufficient. It is
> impossible to explain all the basics of HTML in the scope of an mli.

Well the explanation you've given above, with a link to the HTML spec and
the inlne vs block comment is excellent and would have been a good fit for a comment
in a .mli IMHO.

> 
> Gerd
> 
>> I created a simplified dtd and used it as
>> is mentioned in the manual. But changing the
>> Arguments of element-class and model constraint
>> did not brought any results that make sense to me.
>> Usint that argument jsut creates a different behaviour than
>> using no such arg, but more is not clear to me.
>> 
>>  An explanation or a pointer to explanational docs would be fine.
>> 
>> Ciao,
>>  Oliver
>> 
> 
> 
> -- 
> ------------------------------------------------------------
> Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
> gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
> Phone: +49-6151-153855                  Fax: +49-6151-997714
> ------------------------------------------------------------
> 
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
  2011-03-07 15:40   ` Yoann Padioleau
@ 2011-03-07 16:24     ` Gerd Stolpmann
  0 siblings, 0 replies; 10+ messages in thread
From: Gerd Stolpmann @ 2011-03-07 16:24 UTC (permalink / raw)
  To: Yoann Padioleau; +Cc: oliver, caml-list

Am Montag, den 07.03.2011, 07:40 -0800 schrieb Yoann Padioleau:
> On Mar 7, 2011, at 4:27 AM, Gerd Stolpmann wrote:
> 
> > Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
> >> Hello,
> >> 
> >> tried around using the simple-dtd argument
> >> for Nethtme.parse.
> >> 
> >> It changes the behaviour compared to
> >> the default behaviour, but I could not find out
> >> how this works.
> >> 
> >> Someone here who can explain me this
> >> argument and describe, how it can be used?
> > 
> > Maybe the HTML specification would be a good reference here:
> > http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
> > most HTML elements are either an inline element, a block element, or
> > both ("flow" element). The grammar of HTML is described in terms of
> > these classes. For instance, a P tag (paragraph) is a block element and
> > contains block elements whereas B (bold) is an inline element and
> > contains inline elements. From this follows that you cannot put a P
> > inside a B: <B><P>something</P></B> is illegal.
> > 
> > The parser needs this information to resolve such input, i.e. do
> > something with bad HTML. As HTML allows tag minimization (many end tags
> > can be omitted), the parser can read this as: <B></B><P>something</P>
> > (and the </B> in the input is ignored).
> > 
> > If all start and all end tags are written out, changing the
> > simplified_dtd does not make any difference.
> > 
> > There is no normative text that says how to read bad HTML. Because of
> > this, it is - to a large degree - an interpretation of HTML what you put
> > into simplified_dtd.
> > 
> >> The description IMHO is not sufficient to explain
> >> this feature.
> > 
> > I'd say your formal knowledge about HTML is insufficient. It is
> > impossible to explain all the basics of HTML in the scope of an mli.
> 
> Well the explanation you've given above, with a link to the HTML spec and
> the inlne vs block comment is excellent and would have been a good fit for a comment
> in a .mli IMHO.

Thanks for the suggestion. Just copied the text to the mli.

Gerd

> > 
> > Gerd
> > 
> >> I created a simplified dtd and used it as
> >> is mentioned in the manual. But changing the
> >> Arguments of element-class and model constraint
> >> did not brought any results that make sense to me.
> >> Usint that argument jsut creates a different behaviour than
> >> using no such arg, but more is not clear to me.
> >> 
> >>  An explanation or a pointer to explanational docs would be fine.
> >> 
> >> Ciao,
> >>  Oliver
> >> 
> > 
> > 
> > -- 
> > ------------------------------------------------------------
> > Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
> > gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
> > Phone: +49-6151-153855                  Fax: +49-6151-997714
> > ------------------------------------------------------------
> > 
> > 
> > -- 
> > Caml-list mailing list.  Subscription management and archives:
> > https://sympa-roc.inria.fr/wws/info/caml-list
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
> > 
> 
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work?
  2011-03-07 15:14         ` Gerd Stolpmann
@ 2011-03-07 20:18           ` oliver
  0 siblings, 0 replies; 10+ messages in thread
From: oliver @ 2011-03-07 20:18 UTC (permalink / raw)
  To: caml-list

On Mon, Mar 07, 2011 at 04:14:18PM +0100, Gerd Stolpmann wrote:
> Am Montag, den 07.03.2011, 15:44 +0100 schrieb oliver:
> > On Mon, Mar 07, 2011 at 02:40:37PM +0100, Gerd Stolpmann wrote:
> > > Am Montag, den 07.03.2011, 13:57 +0100 schrieb oliver:
> > > > On Mon, Mar 07, 2011 at 01:27:55PM +0100, Gerd Stolpmann wrote:
> > > > > Am Sonntag, den 06.03.2011, 23:52 +0100 schrieb oliver:
> > > > > > Hello,
> > > > > > 
> > > > > > tried around using the simple-dtd argument
> > > > > > for Nethtme.parse.
> > > > > > 
> > > > > > It changes the behaviour compared to
> > > > > > the default behaviour, but I could not find out
> > > > > > how this works.
> > > > > > 
> > > > > > Someone here who can explain me this
> > > > > > argument and describe, how it can be used?
> > > > > 
> > > > > Maybe the HTML specification would be a good reference here:
> > > > > http://www.w3.org/TR/1999/REC-html401-19991224. You will see there that
> > > > > most HTML elements are either an inline element, a block element, or
> > > > > both ("flow" element). The grammar of HTML is described in terms of
> > > > > these classes. For instance, a P tag (paragraph) is a block element and
> > > > > contains block elements whereas B (bold) is an inline element and
> > > > > contains inline elements. From this follows that you cannot put a P
> > > > > inside a B: <B><P>something</P></B> is illegal.
> > > > > 
> > > > > The parser needs this information to resolve such input, i.e. do
> > > > > something with bad HTML. As HTML allows tag minimization (many end tags
> > > > > can be omitted), the parser can read this as: <B></B><P>something</P>
> > > > > (and the </B> in the input is ignored).
> > > > > 
> > > > > If all start and all end tags are written out, changing the
> > > > > simplified_dtd does not make any difference.
> > > > > 
> > > > > There is no normative text that says how to read bad HTML. Because of
> > > > > this, it is - to a large degree - an interpretation of HTML what you put
> > > > > into simplified_dtd.
> > > > > 
> > > > > > The description IMHO is not sufficient to explain
> > > > > > this feature.
> > > > > 
> > > > > I'd say your formal knowledge about HTML is insufficient.
> > > > [...]
> > > > 
> > > > If formal HTML spec is sufficient to know the behaviour of the module,
> > > > there would no need to have the dtd-argument, which seems, follwoinjg your
> > > > explanations, to change the behavior in a way that it does NOT follow
> > > > the formal specifications.
> > > 
> > > There is no standard regarding that (except that HTML is also SGML, and
> > > there are some rules for that in SGML). You could also reject bad HTML.
> > > But this has not become common practice (unlike for XML, for instance).
> > > 
> > > So, it depends on your HTML documents how you want to fix bad HTML.
> > > That's the reason why you can configure it.
> > [...]
> > 
> > But it's not mentioned how the dtd-Argument works.
> > 
> > Does it change the behaviour only for those tags that
> > are mentioned in the dtd-argument?
> > What about the other args? Will they stay as before (default)?
> 
> I think this is pretty clear: if you set the dtd arg, you pass a
> completely new dtd in, overriding any default.
[...]

Thanks for the answer, that was what I wanted to know.


> This is how optional
> arguments work in Ocaml.

No, this is how you have implemented it.

An optional argument also could be used to just overwrite those entries of the
default dtd that are passed in.

Like in a record, where just a changed part of it needs to be mentioned and the
rest stays, or in an association list, where newly added items will be found
because they will be found first, which means only newly added items are changed.

[...]
> Btw, you could have easily answered that yourself by looking at the
> source. I mean just for the case that you do not see the obvious.

Yes, I could have looked into the sources, maybe next time I will do it,
even I prefer docs that are clear enough to avoid the need to look into the sources
of a library.


Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-03-07 20:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-06 22:52 [Caml-list] ocamlnet: Netheml: simple-dtd: how does this work? oliver
2011-03-07 12:27 ` Gerd Stolpmann
2011-03-07 12:57   ` oliver
2011-03-07 13:40     ` Gerd Stolpmann
2011-03-07 14:44       ` oliver
2011-03-07 14:53         ` oliver
2011-03-07 15:14         ` Gerd Stolpmann
2011-03-07 20:18           ` oliver
2011-03-07 15:40   ` Yoann Padioleau
2011-03-07 16:24     ` Gerd Stolpmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).