From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.9 required=5.0 tests=HTML_10_20,HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 85204BC69 for ; Wed, 18 Jul 2007 23:58:38 +0200 (CEST) Received: from ik-out-1112.google.com (ik-out-1112.google.com [66.249.90.180]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6ILwbKd015886 for ; Wed, 18 Jul 2007 23:58:38 +0200 Received: by ik-out-1112.google.com with SMTP id c21so379715ika for ; Wed, 18 Jul 2007 14:58:37 -0700 (PDT) Received: by 10.143.11.13 with SMTP id o13mr78900wfi.1184795915671; Wed, 18 Jul 2007 14:58:35 -0700 (PDT) Received: by 10.142.72.4 with HTTP; Wed, 18 Jul 2007 14:58:35 -0700 (PDT) Message-ID: <28fa90930707181458p26eac6e6y7b45018b7c91ca65@mail.gmail.com> Date: Wed, 18 Jul 2007 14:58:35 -0700 From: "Luca de Alfaro" To: caml-list@yquem.inria.fr Subject: Fast XML parser MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_82343_11177250.1184795915639" X-Miltered: at discorde with ID 469E8D0D.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; parser:01 wiki:01 wiki:01 ...,:98 ...,:98 parsing:01 parsing:01 parse:02 parse:02 string:02 string:02 bytes:03 bytes:03 library:03 library:03 ------=_Part_82343_11177250.1184795915639 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline I am interested in parsing Wiki markup language that has a few tags, like
...
, ...,. These tags are sparse, meaning that the ratio of number of tags / number of bytes is low. I would like, given a string (or a stream) with such tags, to parse it as fast as possible. Efficiency is a primary consideration, and so is simplicity of the implementation. Do you have any advice about the library I should be using? Thanks, Luca ------=_Part_82343_11177250.1184795915639 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline I am interested in parsing Wiki markup language that has a few tags, like <pre>...</pre>, <math>...,</math>.
These tags are sparse, meaning that the ratio of number of tags / number of bytes is low.
I would like, given a string (or a stream) with such tags, to parse it as fast as possible.  Efficiency is a primary consideration, and so is simplicity of the implementation.
Do you have any advice about the library I should be using?
Thanks,

Luca

------=_Part_82343_11177250.1184795915639--