From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 005CCBC6C for ; Wed, 6 Feb 2008 22:58:02 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAAm8qUfAXQImh2dsb2JhbACQLgEBAQgKKYEVmyw X-IronPort-AV: E=Sophos;i="4.25,314,1199660400"; d="scan'208";a="7730761" Received: from discorde.inria.fr ([192.93.2.38]) by mail1-smtp-roc.national.inria.fr with ESMTP; 06 Feb 2008 22:58:13 +0100 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id m16Lw2eP023671 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Wed, 6 Feb 2008 22:58:02 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAAm8qUfUGyodi2dsb2JhbACQLgEBAQgEBgkIEQeBFZss X-IronPort-AV: E=Sophos;i="4.25,314,1199660400"; d="scan'208";a="7016172" Received: from smtp3-g19.free.fr ([212.27.42.29]) by mail2-smtp-roc.national.inria.fr with ESMTP; 06 Feb 2008 22:58:01 +0100 Received: from smtp3-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp3-g19.free.fr (Postfix) with ESMTP id 97F8B17D261; Wed, 6 Feb 2008 22:58:01 +0100 (CET) Received: from [192.168.0.3] (rke75-3-82-229-183-156.fbx.proxad.net [82.229.183.156]) by smtp3-g19.free.fr (Postfix) with ESMTP id EEC0D17BB0F; Wed, 6 Feb 2008 22:58:00 +0100 (CET) Message-ID: <47AA2C33.70902@frisch.fr> Date: Wed, 06 Feb 2008 22:52:51 +0100 From: Alain Frisch User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: =?ISO-8859-1?Q?B=FCnzli_Daniel?= Cc: caml-list List Subject: Re: [Caml-list] xmlm and names(paces) References: <998006DB-6778-42F1-B2F4-31CB5E9AA6A2@erratique.ch> In-Reply-To: <998006DB-6778-42F1-B2F4-31CB5E9AA6A2@erratique.ch> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Miltered: at discorde with ID 47AA2D6A.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; frisch:01 frisch:01 bunzli:01 trivial:01 camlp:01 syntax:01 pxp:01 namespaces:01 normalized:01 dictionnary:01 namespaces:01 dictionnary:01 node:01 parser:01 nodes:01 Bünzli Daniel wrote: > As I previously said on this list I'm adding better namespace support to > xmlm. Up to now xmlm just parsed qualified names into their prefix and > local part (prefix, local). Now I'd like to provide the client with > expanded names (uri, local). > > Initially I planned to give the client choice between getting qualified > names or expanded names. However the prefix of qualified names is really > meaningless (it can be alpha converted) and thus cannot be used to > recognize anything in a document. One of the aim of xmlm is simplicity, > as such I think xmlm should only provide expanded names. The problem with expanded names is that it makes it quite tedious to pattern-match on element/attribute names (uri are long!). Of course, it is a trivial exercise in Camlp4 to create a nice syntax for that. Another option is to let the client provide a mapping from uri to fixed prefixes. (PXP can do that kind of prefix normalization.) It is also a good idea to be able to parse XML documents that conform to the XML spec but not the XML Namespaces spec. What about something like that: type name = string * [`N of string * string|`U of string * string|`X] The first component of name gives the full unparsed name from the XML document. The second component gives the qname decomposition; it can be either a known normalized prefix (relative to a dictionnary provided by the client) or an unknown URI. Or it can be an error (the document does not conform to the XML Namespaces spec). If the client does not provide any dictionnary of known prefixes, there will be no `N node. If the parser is run a non-namespace mode, there will be only `X nodes. examples: ("html:p", `N ("xhtml", "p")) the prefix html refers to the known xhtml namespace ("foo:x", `U ("http://unknownnamespaceuri", "x")) the prefix foo refers to a unknown uri ("x:y", `X) the prefix x is not bound to any namespace ("x::z", `X) name is ill-formed w.r.t. the XML Namespaces spec. Also, it is necessary to give the client a way to know the namespace bindings in scope at any node. Some XML languages like XML-Schema need this information. A possible way to do it is just to keep the xmlns declarations as regular attributes. As a minor alternative, in order to reduce the syntactic overhead, it is possible to use a single string to encode the three possible cases. E.g.: "p:xhtml" Known uri "x::http://unknownnamespaceuri" Unknown uri "" Ill-formed qname -- Alain