From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 53222BC6C for ; Tue, 5 Feb 2008 09:36:03 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAB6up0dC+VLki2dsb2JhbACQNAEBAQgEBAkKEQWWN4Vn X-IronPort-AV: E=Sophos;i="4.25,306,1199660400"; d="scan'208";a="22197588" Received: from wx-out-0506.google.com ([66.249.82.228]) by mail4-smtp-sop.national.inria.fr with ESMTP; 05 Feb 2008 09:36:02 +0100 Received: by wx-out-0506.google.com with SMTP id h27so2179771wxd.15 for ; Tue, 05 Feb 2008 00:36:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer:sender; bh=azunO6+cp3NiW9PL0VeR4N9mTaFiY+Dr+fxTd1+Ou30=; b=cJnCxE1hA14HkXqs1GwPX+N+viw3NJrpNYUIafyNp4Ek6EnqFyBoEmBjcUW5dndiA8K5U22CJkeCS1jVqHPTVjCivA1o+64XQFkyUPJYdH6mtMGvzTILMzhDtzSa2LgRmwPH9nhybl9RYQc93H83Xc221TN8+LuaGloGHWniH0c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer:sender; b=wfbwBdRq+DuQmjLoUS9lbuVmHVIMaogspqvfsBrcyZqQqhUNGQDASEKR7TO6SmfBeHjHRte7o8o1kllCQUAkbKdg3VUA/dfPNmxN/gDuG8rve1Gh3c+L249d4GD/MEKKaDNTGXY6EhWJU64D8S5xkOd/Nlcfu4ItQpf2nA+uelU= Received: by 10.141.88.3 with SMTP id q3mr5459832rvl.94.1202200559578; Tue, 05 Feb 2008 00:35:59 -0800 (PST) Received: from ?192.168.1.58? ( [85.1.105.53]) by mx.google.com with ESMTPS id d26sm3802001nfh.25.2008.02.05.00.35.58 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 05 Feb 2008 00:35:59 -0800 (PST) Message-Id: <36788A32-5BAC-4D57-9D0A-B9A20A49536F@erratique.ch> From: =?ISO-8859-1?Q?B=FCnzli_Daniel?= To: caml-list caml-list In-Reply-To: <47A7EDFE.1050805@frisch.fr> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v915) Subject: Re: [Caml-list] [OSR] Suggested topic - XML processing API Date: Tue, 5 Feb 2008 09:36:02 +0100 References: <52FAAA41-5B70-4F87-9F83-B8A96EA48D34@erratique.ch> <200801301035.31447.jon@ffconsultancy.com> <47A7EDFE.1050805@frisch.fr> X-Mailer: Apple Mail (2.915) Sender: =?ISO-8859-1?Q?Daniel=20B=FCnzli?= X-Spam: no; 0.00; bunzli:01 buenzli:01 frisch:01 frisch:01 libs:01 variants:01 variants:01 underscores:01 node:01 parser:01 nodes:01 parser:01 abstraction:01 abstraction:01 formalizing:01 Le 5 f=E9vr. 08 =E0 06:02, Alain Frisch a =E9crit : > As suggested before, you really need to say something, at least, =20 > about: [...] - Whether character references and predefined entity references must =20 be resolved. Hint : yes. Le 5 f=E9vr. 08 =E0 06:02, Alain Frisch a =E9crit : > - having a common spec for several libs makes more sense if they can =20= > share common types; maybe you should use polymorphic variants =20 > instead of regular ones? Agreed. In xmlm these variants become polymorphic in the next version. Other comments. * IMHO, do not use camel casing. Underscores are more caml like, i.e. =20= xml_node, etc. * Regarding naming I would call xmlNode xml_tree and in general drop =20 the xml prefix from the cases. * "combine" argument, in my opinion parser should always combine =20 adjacent pcdata nodes. * As other may now know I don't like to raise exceptions, the next =20 version of xmlm doesn't raise exceptions (but given recent discussions =20= it seems others do like exceptions). * Regarding the way the parser is invoked I don't like the way it is =20= done : (1) The function "parse", I can only use it with channels this is not =20= good (2) Having convenience parse_file is always useless to me since =20 it is hard to know the exact kind of error handling performed by such =20= functions without looking at its source. The way I do this kind of things is to define an input abstraction =20 type. First you create an input abstraction from a data source (e.g. in_channel, strings, and a callback source) and then you invoke =20= the parser with the input abstraction (actually I started an OSR on =20 devising IO modules with non object-oriented IO sources and =20 destination reflecting this view, but I'm reluctant to publish it). In general I'd like to say that I'm a little bit dubious about this =20 effort. Actually I would refrain from formalizing the actual way the =20 parser is invoked, clients can also perform their bit of work. I would =20= concentrate on defining : 1) Parsing _result_ types and a precise definition of the actual =20 _form_ of the data they contain. More than one form may be defined. =20 This is the most important thing if you would like to be able to =20 switch implementation, the actual input procedure can easily be =20 isolated from the rest of your source. 2) A minimal list of input sources (e.g. in_channel and string) from =20 which the parser should be able to read without going in further =20 details on how the actual input procedure should be performed. Just =20 specify the state in which sources are accepted for input and left =20 after output. Best, Daniel=