From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 5B09EBBAF for ; Thu, 18 Sep 2008 11:12:28 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AikBAE+30UhKfSwclGdsb2JhbACSTz4BAQEBCQMKBxEDnxGHIwEChAU X-IronPort-AV: E=Sophos;i="4.32,420,1217800800"; d="scan'208";a="29318084" Received: from yx-out-2324.google.com ([74.125.44.28]) by mail4-smtp-sop.national.inria.fr with ESMTP; 18 Sep 2008 11:12:27 +0200 Received: by yx-out-2324.google.com with SMTP id 3so1114299yxj.3 for ; Thu, 18 Sep 2008 02:12:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=Yao6a1mU3ouG+R0y0CbybdvOwAYrOT8OM/Vgz/3GIn4=; b=MaLUeT4Cz8GBnKaOv1r5H063TsqAhjVBnsX613w24KGw/bR5GZiPOKACxdzrdY+u9z +do77ezF/1RThrWXtJKTuLbx4cr7oZKHrRwyk32xubkpadB1Xm8TyQz9DCbIGMTy57w8 cmCGeWx0QLKx6i9XPL/JevZT0YVZoXkRddU0c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=e2LIhMW8m/A+CIiVoB2LRjXMQS+gz0VpIWPpGv+9+f0qZDnL9WRyurfIDawWgZdHhQ jW/zSti0GdiGatXSRVWh8Dhu+NVMJQEdPCBfASYIJnTSx3TMPG1ILuNF649/RP0FAKyc v1nJZn78VTACRhmvbJQ2Mxup1VPabv5gAbDno= Received: by 10.151.42.6 with SMTP id u6mr870967ybj.220.1221729146721; Thu, 18 Sep 2008 02:12:26 -0700 (PDT) Received: by 10.151.44.7 with HTTP; Thu, 18 Sep 2008 02:12:26 -0700 (PDT) Message-ID: <9d3ec8300809180212r7e3dcdf3wd13c5cff69d5034b@mail.gmail.com> Date: Thu, 18 Sep 2008 10:12:26 +0100 From: "Till Varoquaux" To: "Vincent Hanquez" Subject: Re: [Caml-list] XML library for validating MathML Cc: "Dario Teixeira" , caml-list@yquem.inria.fr In-Reply-To: <20080918083853.GA15219@snarc.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <103293.54569.qm@web54606.mail.re2.yahoo.com> <20080918083853.GA15219@snarc.org> X-Spam: no; 0.00; pxp:01 parsers:01 ocaml:01 pxp:01 gerd:01 parser:01 dtds:01 subset:01 beginner's:01 ocaml:01 bug:01 expat:98 engineered:98 1.0:98 ffff:98 PXP is tough to work with and feels a bit crazy but it is good with standards (It can sort out any DTD's I have ever thrown at it). xml-light is, well, very broken (it doesn't even support charcode switching). There are several XML parsers in OCaml and I've had a stint with a few of them; the only two I would consider using are expat and Pxp with a marked preference for the later. PXP can be very confusing and feels over engineered at times but it does the job. And remember parsing XML is a hard job, much harder than we often give it credit for.... Hats off to Gerd for providing us with a proper parser. Till On Thu, Sep 18, 2008 at 9:38 AM, Vincent Hanquez wrote: > On Wed, Sep 17, 2008 at 11:58:05AM -0700, Dario Teixeira wrote: >> Given a string containing a mathematical expression in the MathML >> markup, I need to verify that the expression is indeed valid MathML. >> I am therefore looking for an XML library that can verify an expression >> against a given DTD. >> >> Now, I have tried Xml-light, and the code I used is listed below. >> Unfortunately, it fails when trying to parse MathML's DTD (it's the >> standard DTD from the W3C). I have tried simpler DTDs, and it does work >> with them; am I therefore correct in assuming that Xml-light can only >> handle a particular version/subset of DTD features? > > I don't know about validation (i'll probably suggest looking at PXP tho), > but xml-light is very bad for XML compliance. the library is (happily) parsing > XML files that it shouldn't, which tell a lots concerning its validation > abilities ... > > for example, the XML supported character range is not even checked: > > Xml 1.0 specification -- 2.2 Characters > > Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | > [#xE000-#xFFFD] | [#x10000-#x10FFFF] > > others problems include (uncomplete list): > - complete unicode un-awareness > - funny & wrong entities handling > > -- > Vincent > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs >