caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Jim Miller" <gordon.j.miller@gmail.com>
To: "caml-list List" <caml-list@inria.fr>
Subject: Re: [Caml-list] [OSR] Suggested topic - XML processing API
Date: Tue, 29 Jan 2008 22:26:00 -0500	[thread overview]
Message-ID: <beed19130801291926u36e7fc30w958d0370c87d3bf0@mail.gmail.com> (raw)
In-Reply-To: <52FAAA41-5B70-4F87-9F83-B8A96EA48D34@erratique.ch>

> As such I'm not sure such an interface is really feasible. Now if you
> see a common pattern or concrete type signatures that could be changed
> to make parsers more compatible do not hesitate to communicate them.
> If it benefits the users of my parser and remains in its philosophy
> I'll happily implement them. But _you_ have to make concrete proposal,
> I'm not going to research this. Please do not just initiate a
> discussion because you like the abstract idea of being able to swap
> xml parser implementations, make proposals.
>
> Best,
>
>

Fair enough, I'll start with a proposal on the topic, though being
late I'm not going to go too deep.  If it gets through a first round
of discussion, I'll start a node on the wiki and be happy to take
point on maintaining a document based on feedback.

My interest in this is based on my experience in dealing with XML.
90% of what I need to do is parse simple documents defining a known
structure that are coming from either files, strings, or the network.
Its also based on the responses I've received when attempting to
evangelize OCaml to a crowd whose first task is typically to try and
connect to the network, read some XML, do some processing on the XML,
and generate a response.

The purpose of this minimum implementation is to provide a common API
to perform the following tasks:

- Define a simple type that can be used to construct a tree
representing XML data.
- Parse an existing XML document into a simple data structure allowing
access to the data
- Manipulate the result of parsing the XML document
- Construct simple XML documents

XML parser implementations are free to expand beyond this
implementation, this is merely a recommendation for a minimum
implementation.

type xmlNode =
 | XmlElement of (namespace: string * tagName: string * attributes:
(string * string) list * (children:xmlNode list) )
 | XmlPCData of (text:string)

with the following functions to parse data from different types of
sources.  The parsing, by default, should be non-validating but will
ensure well-formedness

val parse_file : string -> xmlNode
val parse_string: string -> xmlNode
val parse_channel: Pervasives.in_channel -> xmlNode

val to_string : xmlNode -> string

val iter : (xmlNode -> unit) -> xmlNode -> unit
val map : (xmlNode -> 'a) -> xmlNode -> 'a list
val fold : ('a -> xmlNode -> 'a) -> 'a -> xmlNode -> 'a

Additional sections/areas that would have to be defined:

o Handling errors while parsing

o Validation.

I personally prefer having a different set of methods that perform the
parsing with validation so that its obvious to me what is being
performed when I invoke a function.  I would be content with optional
arguments to the parse_ functions that are defined above but with the
default being to not validate.

o Callback/SAX style API

This is where I believe significant differences exist between XML
implementations.  I'm sure that the most that can be done here will be
to standardize the names of the functions or types that are used.


  reply	other threads:[~2008-01-30  3:26 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-30  0:54 Jim Miller
2008-01-30  2:37 ` [Caml-list] " Bünzli Daniel
2008-01-30  3:26   ` Jim Miller [this message]
2008-01-30  7:35     ` Alain Frisch
2008-01-30 10:32       ` Bünzli Daniel
2008-01-30 10:35     ` Jon Harrop
2008-01-30 17:25       ` Jim Miller
2008-02-05  3:23         ` Jim Miller
2008-02-05  5:02           ` Alain Frisch
2008-02-05  8:36             ` Bünzli Daniel
2008-02-05  9:51               ` Vincent Hanquez
2008-02-05 10:13                 ` Jacques Garrigue
2008-02-05 11:14                   ` Vincent Hanquez
2008-02-05 10:31                 ` Bünzli Daniel
2008-02-05 10:43                   ` Nicolas Pouillard
2008-02-05 13:29                     ` Jon Harrop
2008-02-05 14:53                       ` micha
2008-02-05 14:53                         ` Jon Harrop
2008-02-05 14:57                       ` David Teller
2008-02-05 11:21                   ` Vincent Hanquez
2008-02-05  8:15           ` Vincent Hanquez
2008-02-05 11:16             ` Stefano Zacchiroli
2008-01-30 15:55   ` Vincent Hanquez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=beed19130801291926u36e7fc30w958d0370c87d3bf0@mail.gmail.com \
    --to=gordon.j.miller@gmail.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).