From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 0974DBC6C for ; Wed, 30 Jan 2008 08:40:44 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAACa4n0fUGyodi2dsb2JhbACQJwEBAQgEBgkIEQefNg X-IronPort-AV: E=Sophos;i="4.25,276,1199660400"; d="scan'208";a="8524654" Received: from smtp3-g19.free.fr ([212.27.42.29]) by mail3-smtp-sop.national.inria.fr with ESMTP; 30 Jan 2008 08:40:43 +0100 Received: from smtp3-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp3-g19.free.fr (Postfix) with ESMTP id 2224917B54F; Wed, 30 Jan 2008 08:40:43 +0100 (CET) Received: from [192.168.0.3] (rke75-3-82-229-183-156.fbx.proxad.net [82.229.183.156]) by smtp3-g19.free.fr (Postfix) with ESMTP id E620917B53C; Wed, 30 Jan 2008 08:40:42 +0100 (CET) Message-ID: <47A028D0.2000909@frisch.fr> Date: Wed, 30 Jan 2008 08:35:44 +0100 From: Alain Frisch User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Jim Miller Cc: caml-list Subject: Re: [Caml-list] [OSR] Suggested topic - XML processing API References: <52FAAA41-5B70-4F87-9F83-B8A96EA48D34@erratique.ch> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; frisch:01 frisch:01 ocaml:01 parser:01 namespaces:01 normalized:01 parser:01 dictionnary:01 node:01 namespaces:01 nodes:01 modular:01 wrote:01 caml-list:01 constructor:01 Jim Miller wrote: > type xmlNode = > | XmlElement of (namespace: string * tagName: string * attributes: > (string * string) list * (children:xmlNode list) ) > | XmlPCData of (text:string) There has been some discussions here a while ago about standardizing XML types across OCaml libraries. You might want to look up the archives. Here are some random remarks. First, you need to specify several things in the type above. - the encoding of strings; if the parser cannot be configured, I guess that normalizing everything to utf-8 is the most natural choice. - the handling of namespaces; does the first argument to XmlElement refers to the namespace prefix as used in the document (it'd make matching impossible because the document can use arbitrary prefixes), a normalized version (you'd need to provide the parser with more info), or the namespace URI (which makes pattern matching quite tedious). Also, it is sometimes necessary to keep the [prefix->uri] dictionnary available in at every node (e.g. to deal with XML Schema documents, where prefixes can be used in attribute values). Moreover, some XML documents may be valid w.r.t. to the XML spec without conforming to the XML Namespaces one. - whether adjacent XmlPCData nodes are allowed or not. - whether the parser performs whitespace normalization (and how). Also, in many cases, the client of the parser might want to get more information, like locations in the source document. If you intend to use the same type to produce XML documents from an internal representation, I think you might want to add an extra constructor: | XmlMany of xmlNode list This makes it much easier to build and compose XML fragments in a modular way. Also, you need to specify how the XML printer is supposed to deal with namespaces. -- Alain