From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.1 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id AA933BC69 for ; Thu, 19 Jul 2007 11:02:37 +0200 (CEST) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.224]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6J92and005443 for ; Thu, 19 Jul 2007 11:02:37 +0200 Received: by wr-out-0506.google.com with SMTP id i21so422264wra for ; Thu, 19 Jul 2007 02:02:35 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=m6BB86cVOr9EntXCVwqE0f1yf8T3lAgg9/s3IYWyPubejmslPs/0Fbi7PUKTirRK7thFXJnh8UEC22MSIY8KiRsHYQVFqkJ27Hu5EypoUX2XnBtKjL0HTH0PWcpdiZPawETJD2vAMU7Xmuk7yq6psFmWFl+Dqt5tUkWWvbv28IU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=KlnXqYpAtF/6Xj6+7UzUxHPQkkPjqRqeImnvMaQP+4ocMnxr3c+JzaZW2q5Tg6nSO60MaX3zGd1RRj4fdTPaeHepOUVGxWd8rKEC94iUfM9F+S87D6+AP9JhT55joEeOd2MNBzd9DL/m3l21uuShDFLKHZHCH0tUFeZdRU9Cg2I= Received: by 10.143.16.9 with SMTP id t9mr183915wfi.1184835755276; Thu, 19 Jul 2007 02:02:35 -0700 (PDT) Received: by 10.143.168.5 with HTTP; Thu, 19 Jul 2007 02:02:35 -0700 (PDT) Message-ID: <9d3ec8300707190202t57a63aber38d86a5310cd0e9a@mail.gmail.com> Date: Thu, 19 Jul 2007 11:02:35 +0200 From: "Till Varoquaux" To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Fast XML parser In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <28fa90930707181458p26eac6e6y7b45018b7c91ca65@mail.gmail.com> <9d3ec8300707181548n2c7ffa01xa9d2bea20c90c056@mail.gmail.com> X-j-chkmail-Score: MSGID : 469F28AC.000 on discorde : j-chkmail score : X : 0/20 1 0.000 -> 1 X-Miltered: at discorde with ID 469F28AC.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; parser:01 parser:01 wiki:01 ocamllex:01 syntax:01 lexer:01 ocamllex:01 cheers:01 ...,:98 wrote:01 parsing:01 parsing:01 caml-list:01 oops:01 tree:02 OOps fogot to "reply to all". Here we go again: On 7/19/07, Gabriel Kerneis wrote: > > I certainly wouldn't recommend xml-light for *every* project where an > XML parser is needed, but look at the OP's requirements : > > > > I am interested in parsing Wiki markup language that has a few > > > > tags, like
...
, ...,. > > > > These tags are sparse, meaning that the ratio of number of tags / > > > > number of bytes is low. > On such a simple case, xml-light (which is basically a simple ocamllex > file + a few things to build the syntax tree) should perform quite > well. I know it doesn't handle DTD, etc. but in *that* case, who cares ? > Xml-light would indeed provide a very simple parser and pretty good speed. Whether to use it vs an event based parser is a matter of how big these files really are (if they are not huge you shouldn't see a real difference so you might as well keep it simple). As for compliance, xml-light sort of does DTD. The issue is a lot more subtle: it drops many features from the xml standard (including encoding declaration) and thus will reject many valid xml documents. This is, off course, not tolerable when you have to accepts documents from sources other than your program... I wouldn't recommend xml-light for any serious project reading xml files from the open. It can however be great when you have control over the source generating your documents (ie documents generated by xml-light itself). > > Ultimately if you are parsing very simple files and are aiming for > > pure speed you could write a simple lexer with ocamllex and use that > > as base layer. > > That could be a solution, and (provided the licence you chose for your > project is compatible) you could even use xml-light as an example to > begin with (stripping things you don't need). Indeed, and that should be real quick to do since the source code is simple and easy to read. I should have mentioned it. Cheers, Til