From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id C3978BBAF for ; Thu, 18 Sep 2008 19:58:04 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AigBANUz0kjOvjGrmmdsb2JhbACTDgEBAQEBCAUIBxEDpwgDhAU X-IronPort-AV: E=Sophos;i="4.32,422,1217800800"; d="scan'208";a="15107294" Received: from web54601.mail.re2.yahoo.com ([206.190.49.171]) by mail2-smtp-roc.national.inria.fr with SMTP; 18 Sep 2008 19:58:04 +0200 Received: (qmail 34842 invoked by uid 60001); 18 Sep 2008 17:58:03 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=Y4VgnAQvXVAk+sPxncrBVq28U7hjFAb+LrZ6wx3Mp+MiLquYAi/vorMd3feTl7etootXsWPMnSqTUR9T5rWbTMlXpTlswFenjhiKGmw4bkHs/JMgHxDeY69SJACr+lCLBbK37eUVHl5vGS2dECrH5wDt+uKNWsPXIQ7wF+2Vnnc=; X-YMail-OSG: H.UzFm8VM1lOMtjC_6p2dQ_3DE8VypapxpywHAbA7DTNnmVVzvC6WuSgjzccwX1eAtlsqCo2jOULkrvNQR2kfhGGGLfu_sJCiRqKjgcsae0KAEcvyWbb6h3MrQiWsXlZ9mqkiUBaBC05BzVQuCObaEr0m89IPPxBJwQ5z8hAnN6IAn8gJC_2 Received: from [213.205.69.150] by web54601.mail.re2.yahoo.com via HTTP; Thu, 18 Sep 2008 10:58:03 PDT X-Mailer: YahooMailWebService/0.7.218.2 Date: Thu, 18 Sep 2008 10:58:03 -0700 (PDT) From: Dario Teixeira Subject: Re: [Caml-list] XML library for validating MathML To: caml-list@yquem.inria.fr In-Reply-To: <38632.29890.qm@web54602.mail.re2.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-ID: <359336.28901.qm@web54601.mail.re2.yahoo.com> X-Spam: no; 0.00; pxp:01 pxp:01 endline:01 node:01 endline:01 nodes:01 iter:01 exn:01 2.0:98 28,:98 amp:98 28,:98 amp:98 config:01 config:01 Hi, Well, as it turns out, building a basic "Hello World" in PXP is relatively simple (I followed the manual which is very helpful in the beginning). However, though the DTD validation works fine with the simple examples I tr= ied, it fails for a MathML document. Note that I am using the DTD as provided by the W3C, available from here: http://www.w3.org/Math/DTD/mathml2.tgz When processing the MathML DTD, PXP outputs a few a warnings about entities declared twice, about names reserved for future extensions, and quite a lot of warnings about code points that cannot be represented. I can ignore those for now. When it does fail, this is the error produced: In entity ent-isonum =3D PUBLIC "-//W3C//ENTITIES Numeric and Special Graph= ic for MathML 2.0//EN" "isonum.ent", at line 28, position 44: Called from entity [dtd] =3D SYSTEM "mathml2.dtd", line 1969, position 0: ERROR (Well-formedness constraint): The character '&' must be written as '&= amp;' Looking at the "isonum.ent" file (packaged with the W3C zip), these are the contents of line 28, where the error occurs: Though 0x26 is indeed the codepoint for the ampersand character, I don't get why it appears twice. Is this a case of double escaping? Could this be the reason PXP chokes? Any thoughts? Best regards, Dario Teixeira P.S. This is the programme I used for testing. Its code is pretty much lifted from the PXP manual: open Pxp_document open Pxp_yacc class warner =3D object method warn w =3D print_endline ("WARNING: " ^ w) end let rec print_structure n =3D let ntype =3D n#node_type in match ntype with | T_element name -> print_endline ("Element of type " ^ name); let children =3D n # sub_nodes in List.iter print_structure children | T_data -> print_endline "Data" | _ -> assert false let () =3D try let config =3D {default_config with warner =3D new warner} = in let doc =3D parse_document_entity config (from_file "test.x= ml") default_spec in print_structure (doc#root) with exc -> print_endline (Pxp_types.string_of_exn exc) =0A=0A=0A