From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.9 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 3CD25BC6C for ; Wed, 30 Jan 2008 11:32:40 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAEzgn0dC+Vyulmdsb2JhbACQJwEBAQEHBAYHChEHlxWIMA X-IronPort-AV: E=Sophos;i="4.25,276,1199660400"; d="scan'208";a="21971626" Received: from ug-out-1314.google.com ([66.249.92.174]) by mail4-smtp-sop.national.inria.fr with ESMTP; 30 Jan 2008 11:32:39 +0100 Received: by ug-out-1314.google.com with SMTP id k40so320683ugc.18 for ; Wed, 30 Jan 2008 02:32:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer:sender; bh=Hq8itUUdNt3g/KcoGpOhaA3VxhJ1p34OyReAgRCILE0=; b=Wsj7f2YkhGS22Fmbg+ms+GGJhYDGaOmz2sr967u5fGyh7utwcBAwCqqy0FF9yG5qKZk5dsjTKkSzhRf9I6A7ciGpROnkMHqhZQpsXUv2abBinvYwGtVW7OeqbJQsgOLXbEuRl9jV7TQUb6R70q+8czndIwwiFz7fVyw93B4esN0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer:sender; b=xwze3i1TKOlMkUac4lFcG+c+YcUPvafRzurOvGhfRojb7YBIIR7aD0SSrP89R2NrPAs8YQ95xSTJNxkb54ryHQr4i9uCDBuOnkv4H/1tRktpxaQYZlqqbJAUdUX6huXn4A+2Uo6+E+MBqgsfRGIMdQiPROCtUqzmDp2CAjbV6Pw= Received: by 10.66.221.5 with SMTP id t5mr2582968ugg.83.1201689159100; Wed, 30 Jan 2008 02:32:39 -0800 (PST) Received: from ?192.168.1.58? ( [85.2.120.144]) by mx.google.com with ESMTPS id m1sm5110934uge.69.2008.01.30.02.32.38 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 30 Jan 2008 02:32:38 -0800 (PST) Message-Id: <920A850B-7FB2-4E2D-8E2C-573029E4C335@erratique.ch> From: =?ISO-8859-1?Q?B=FCnzli_Daniel?= To: caml-list caml-list In-Reply-To: <47A028D0.2000909@frisch.fr> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v915) Subject: Re: [Caml-list] [OSR] Suggested topic - XML processing API Date: Wed, 30 Jan 2008 11:32:41 +0100 References: <52FAAA41-5B70-4F87-9F83-B8A96EA48D34@erratique.ch> <47A028D0.2000909@frisch.fr> X-Mailer: Apple Mail (2.915) Sender: =?ISO-8859-1?Q?Daniel=20B=FCnzli?= X-Spam: no; 0.00; bunzli:01 buenzli:01 alain's:01 parser:01 wrote:01 parsing:01 caml-list:01 tree:02 api:02 string:02 string:02 external:03 predefined:03 character:04 attribute:04 > Jim Miller wrote: >> type xmlNode = >> | XmlElement of (namespace: string * tagName: string * attributes: >> (string * string) list * (children:xmlNode list) ) >> | XmlPCData of (text:string) Attributes can have their own namespace, have a look a the spec [1]. I see it more that way (but I'm biaised). type name = string * string type attribute = name * string type tag = name * attribute list etc. Adding to Alain's list, other things that need to be specified. - what do you do with processing instructions and comments - whether character references and predefined entities are resolved. - how do you deal with external entity references. - where does the parsing end (I don't do it according to the xml spec because from the words of the spec editor himself [2] the spec is broken). I did document many of this issues for my own parser. You may want to check that out [3] it may show you some of the specification details that are needed (note that the tree and the cursor representations are going away in the next version). Best, Daniel [1] http://www.w3.org/TR/REC-xml-names/ [2] http://www.xml.com/axml/notes/TrailingMisc.html [3] http://erratique.ch/software/xmlm/doc/Xmlm#io