caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Alain Frisch <alain@frisch.fr>
To: Jim Miller <gordon.j.miller@gmail.com>
Cc: caml-list <caml-list@yquem.inria.fr>
Subject: Re: [Caml-list] [OSR] Suggested topic - XML processing API
Date: Wed, 30 Jan 2008 08:35:44 +0100	[thread overview]
Message-ID: <47A028D0.2000909@frisch.fr> (raw)
In-Reply-To: <beed19130801291926u36e7fc30w958d0370c87d3bf0@mail.gmail.com>

Jim Miller wrote:
> type xmlNode =
>  | XmlElement of (namespace: string * tagName: string * attributes:
> (string * string) list * (children:xmlNode list) )
>  | XmlPCData of (text:string)

There has been some discussions here a while ago about standardizing XML 
types across OCaml libraries. You might want to look up the archives.

Here are some random remarks.

First, you need to specify several things in the type above.

- the encoding of strings; if the parser cannot be configured, I guess 
that normalizing everything to utf-8 is the most natural choice.

- the handling of namespaces; does the first argument to XmlElement 
refers to the namespace prefix as used in the document (it'd make 
matching impossible because the document can use arbitrary prefixes), a 
normalized version (you'd need to provide the parser with more info), or 
the namespace URI (which makes pattern matching quite tedious). Also, it 
is sometimes necessary to keep the [prefix->uri] dictionnary available 
in at every node (e.g. to deal with XML Schema documents, where prefixes 
can be used in attribute values). Moreover, some XML documents may be 
valid w.r.t. to the XML spec without conforming to the XML Namespaces one.

- whether adjacent XmlPCData nodes are allowed or not.

- whether the parser performs whitespace normalization (and how).


Also, in many cases, the client of the parser might want to get more 
information, like locations in the source document.

If you intend to use the same type to produce XML documents from an 
internal representation, I think you might want to add an extra constructor:

   | XmlMany of xmlNode list

This makes it much easier to build and compose XML fragments in a 
modular way.

Also, you need to specify how the XML printer is supposed to deal with 
namespaces.



-- Alain


  reply	other threads:[~2008-01-30  7:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-30  0:54 Jim Miller
2008-01-30  2:37 ` [Caml-list] " Bünzli Daniel
2008-01-30  3:26   ` Jim Miller
2008-01-30  7:35     ` Alain Frisch [this message]
2008-01-30 10:32       ` Bünzli Daniel
2008-01-30 10:35     ` Jon Harrop
2008-01-30 17:25       ` Jim Miller
2008-02-05  3:23         ` Jim Miller
2008-02-05  5:02           ` Alain Frisch
2008-02-05  8:36             ` Bünzli Daniel
2008-02-05  9:51               ` Vincent Hanquez
2008-02-05 10:13                 ` Jacques Garrigue
2008-02-05 11:14                   ` Vincent Hanquez
2008-02-05 10:31                 ` Bünzli Daniel
2008-02-05 10:43                   ` Nicolas Pouillard
2008-02-05 13:29                     ` Jon Harrop
2008-02-05 14:53                       ` micha
2008-02-05 14:53                         ` Jon Harrop
2008-02-05 14:57                       ` David Teller
2008-02-05 11:21                   ` Vincent Hanquez
2008-02-05  8:15           ` Vincent Hanquez
2008-02-05 11:16             ` Stefano Zacchiroli
2008-01-30 15:55   ` Vincent Hanquez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47A028D0.2000909@frisch.fr \
    --to=alain@frisch.fr \
    --cc=caml-list@yquem.inria.fr \
    --cc=gordon.j.miller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).