caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Martin Jambon <martin.jambon@ens-lyon.org>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Which types for representing HTML documents ?
Date: Mon, 08 Jun 2009 13:35:50 +0200	[thread overview]
Message-ID: <4A2CF796.1070504@ens-lyon.org> (raw)
In-Reply-To: <20090608045957.GA7611@pema>

Sébastien Hinderer wrote:
> Dear all,
> 
> According to you, how could an HTML document best be represented in
> OCaml ?

Ocamlnet's Nethtml works fine for parsing:

type document =
    Element of
      (string * (string * string) list *
       Nethtml.document list)
  | Data of string


If your goal is to interpret arbitrary web pages, you have to allow all kinds
of standard or non-standard elements and attributes anywhere in the document.

If you are creating HTML documents, beware that you can't embed Flash objects
using standard HTML.  I'm not even speaking of javascript happily manipulating
the DOM tree with little restrictions.


Personally I use text templates and validate web pages once they are in my
browser (using the shortcut to validator.w3.org that opera provides).  For
javascript-generated nodes, I just check that it works in various browsers
(the firefox "View source chart" extension is useful for debugging the DOM tree).


I do not suffer at all from the absence of static type-checking of the HTML
tree.  I imagine that the reasons for this are:

* HTML is the final product and is trivial to debug (no need to printf
everything since everything is already printed...)

* There are no complicated conditionals that would leave certain parts of the
code untested for a long time.

* Mainstream web browsers are very tolerant.  Small accidental deviations from
the strict W3C standards usually have no visible effect.



> In particular: would you rather use classes or records, polymorphic
> variants or normal constructors ?
> 
> There are attributes which can occur in several elements, such as id,
> class... How shold these be represented ?
> Should the types reflect the differences between inline elements and
> other types of elements ?


I know this is going to annoy a lot of people on that list, but this feels
very academic to me :-)



Martin

-- 
http://mjambon.com/


      parent reply	other threads:[~2009-06-08 11:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-08  4:59 Sébastien Hinderer
2009-06-08  6:03 ` [Caml-list] " Gabriel Kerneis
2009-06-08 10:05 ` Richard Jones
2009-06-08 11:35 ` Martin Jambon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A2CF796.1070504@ens-lyon.org \
    --to=martin.jambon@ens-lyon.org \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).