From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.5 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id D0B13BC6C for ; Wed, 30 Jan 2008 04:26:02 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAB99n0fAXQInh2dsb2JhbACQJwEBAQgKKZc/h2E X-IronPort-AV: E=Sophos;i="4.25,275,1199660400"; d="scan'208";a="7410333" Received: from concorde.inria.fr ([192.93.2.39]) by mail1-smtp-roc.national.inria.fr with ESMTP; 30 Jan 2008 04:26:02 +0100 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id m0U3Q26U008858 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Wed, 30 Jan 2008 04:26:02 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAB99n0dA6aayjGdsb2JhbACQJwEBAQgEBAsIEQWXQYdh X-IronPort-AV: E=Sophos;i="4.25,275,1199660400"; d="scan'208";a="6727209" Received: from py-out-1112.google.com ([64.233.166.178]) by mail2-smtp-roc.national.inria.fr with ESMTP; 30 Jan 2008 04:26:01 +0100 Received: by py-out-1112.google.com with SMTP id u52so116618pyb.10 for ; Tue, 29 Jan 2008 19:26:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=OuVxeWRd/PuICtwT5YRswjF9p7/KuM3KZBDMFIROwgY=; b=ixaY5gm9J2vrQ8VA0ysZDtXY/1wDnRlmJUNsZqbz+UerugSLSaWu4k86+ZWNaWtO9PF4ZNs6SK+KVY9oovmAXlnUWFrlCcsM6tcNOLdRl9CBbt82wP5fZH+bGs25RaYmaA/HANCC7XM1kwyvECMEHkwXbCghHOMuYZgzOaeAjGk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=yIZEVlmYku9fmOiJQDRuO2nfxlFWwTQgif+zaIZ633PB/FOBoGv3wo2FtTL0hejRiQXrdW4UbPGw/q1j3+F28hSFrUje5iRzwklwvqUOHEfm7Flio5EqJVJiWgTQfgvSdhMeT1ZZX8rAzUfuA1s7Die/08TlFcz52BtQRcnph4c= Received: by 10.142.83.4 with SMTP id g4mr102614wfb.103.1201663560302; Tue, 29 Jan 2008 19:26:00 -0800 (PST) Received: by 10.142.112.7 with HTTP; Tue, 29 Jan 2008 19:26:00 -0800 (PST) Message-ID: Date: Tue, 29 Jan 2008 22:26:00 -0500 From: "Jim Miller" To: "caml-list List" Subject: Re: [Caml-list] [OSR] Suggested topic - XML processing API In-Reply-To: <52FAAA41-5B70-4F87-9F83-B8A96EA48D34@erratique.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <52FAAA41-5B70-4F87-9F83-B8A96EA48D34@erratique.ch> X-Miltered: at concorde with ID 479FEE4A.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; parsers:01 parser:01 parser:01 node:01 wiki:01 ocaml:01 val:01 val:01 pervasives:01 iter:01 invoke:01 abstract:01 signatures:01 parsing:01 parsing:01 > As such I'm not sure such an interface is really feasible. Now if you > see a common pattern or concrete type signatures that could be changed > to make parsers more compatible do not hesitate to communicate them. > If it benefits the users of my parser and remains in its philosophy > I'll happily implement them. But _you_ have to make concrete proposal, > I'm not going to research this. Please do not just initiate a > discussion because you like the abstract idea of being able to swap > xml parser implementations, make proposals. > > Best, > > Fair enough, I'll start with a proposal on the topic, though being late I'm not going to go too deep. If it gets through a first round of discussion, I'll start a node on the wiki and be happy to take point on maintaining a document based on feedback. My interest in this is based on my experience in dealing with XML. 90% of what I need to do is parse simple documents defining a known structure that are coming from either files, strings, or the network. Its also based on the responses I've received when attempting to evangelize OCaml to a crowd whose first task is typically to try and connect to the network, read some XML, do some processing on the XML, and generate a response. The purpose of this minimum implementation is to provide a common API to perform the following tasks: - Define a simple type that can be used to construct a tree representing XML data. - Parse an existing XML document into a simple data structure allowing access to the data - Manipulate the result of parsing the XML document - Construct simple XML documents XML parser implementations are free to expand beyond this implementation, this is merely a recommendation for a minimum implementation. type xmlNode = | XmlElement of (namespace: string * tagName: string * attributes: (string * string) list * (children:xmlNode list) ) | XmlPCData of (text:string) with the following functions to parse data from different types of sources. The parsing, by default, should be non-validating but will ensure well-formedness val parse_file : string -> xmlNode val parse_string: string -> xmlNode val parse_channel: Pervasives.in_channel -> xmlNode val to_string : xmlNode -> string val iter : (xmlNode -> unit) -> xmlNode -> unit val map : (xmlNode -> 'a) -> xmlNode -> 'a list val fold : ('a -> xmlNode -> 'a) -> 'a -> xmlNode -> 'a Additional sections/areas that would have to be defined: o Handling errors while parsing o Validation. I personally prefer having a different set of methods that perform the parsing with validation so that its obvious to me what is being performed when I invoke a function. I would be content with optional arguments to the parse_ functions that are defined above but with the default being to not validate. o Callback/SAX style API This is where I believe significant differences exist between XML implementations. I'm sure that the most that can be done here will be to standardize the names of the functions or types that are used.