From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.1 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id BDFCFBC69 for ; Thu, 19 Jul 2007 00:48:08 +0200 (CEST) Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.227]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6IMm7OO027221 for ; Thu, 19 Jul 2007 00:48:08 +0200 Received: by nz-out-0506.google.com with SMTP id x7so279775nzc for ; Wed, 18 Jul 2007 15:48:07 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=J5smZF392aPlq9kKHTtxX4G9pGAXmoP9S7/1YbWnrrL/ORQk3MnDhNnEal/oxxjNFDp5ioLkKEt+qz5+vXdA8Da3CkfIrteBbUWGCAeGlKNvILmNOjTkWFXN/v1QshnYJgceZEU564LQYwjjzSI0SvW0YVVEXH68bN/7XKriuOM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=nlxcfcpkQpMEl8LcvY1BIsdgz6tX/E1WTz/nwoLK/zquZB294EUt/4j+msfBbI1To145TVXMGzMgHmkZ/nGe1hI9BCS7S1mivl/i+nsVag6NgzG0cUHa9f0GdG9sWkRsVjNFKM67OZBjohB7t7ekV+KH7cnpWmz9gTLvXbKNUgk= Received: by 10.143.33.19 with SMTP id l19mr157499wfj.1184798887174; Wed, 18 Jul 2007 15:48:07 -0700 (PDT) Received: by 10.143.168.5 with HTTP; Wed, 18 Jul 2007 15:48:07 -0700 (PDT) Message-ID: <9d3ec8300707181548n2c7ffa01xa9d2bea20c90c056@mail.gmail.com> Date: Thu, 19 Jul 2007 00:48:07 +0200 From: "Till Varoquaux" To: "Gabriel Kerneis" Subject: Re: [Caml-list] Fast XML parser Cc: caml-list@yquem.inria.fr In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <28fa90930707181458p26eac6e6y7b45018b7c91ca65@mail.gmail.com> X-j-chkmail-Score: MSGID : 469E98A7.000 on discorde : j-chkmail score : X : 0/20 1 0.000 -> 1 X-Miltered: at discorde with ID 469E98A7.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; parser:01 top-down:01 parser:01 pxp:01 klingon:01 subset:01 lexer:01 ocamllex:01 wiki:01 beginner's:01 ocaml:01 bug:01 expat:98 ...,:98 parsing:01 Ouch, I beg to differ, if you want speed and can work stream (linear top-down left-right exploration of the graph), you want an event based xml parser. expat is probably one of the fastest (the c library is known to be a speed demon). PXP does everything including talking klingon and controlling the kitchen sink. It provides an event based layer. I have found Xml-light to be the simplest parser. Alas, it is so simple it is far from implementing the full XML 1.1 specification. This often isn't an issue since most XML files are written in a very small subset of what the language. Ultimately if you are parsing very simple files and are aiming for pure speed you could write a simple lexer with ocamllex and use that as base layer. On 7/19/07, Gabriel Kerneis wrote: > Le Wed, 18 Jul 2007 14:58:35 -0700, "Luca de Alfaro" > a =E9crit : > > I am interested in parsing Wiki markup language that has a few tags, > > like
...
, ...,. > > These tags are sparse, meaning that the ratio of number of tags / > > number of bytes is low. > > I would like, given a string (or a stream) with such tags, to parse > > it as fast as possible. Efficiency is a primary consideration, and > > so is simplicity of the implementation. > > Do you have any advice about the library I should be using? > > You want it simple, you want it light : Xml-light. > > Regards, > -- > Gabriel > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > --=20 http://till-varoquaux.blogspot.com/