caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gabriel Kerneis <gabriel.kerneis@enst.fr>
To: "Till Varoquaux" <till.varoquaux@gmail.com>, caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Fast XML parser
Date: Thu, 19 Jul 2007 08:24:21 +0200	[thread overview]
Message-ID: <E1IBPR6-0000rx-I4@kerneis.info> (raw)
In-Reply-To: <9d3ec8300707181548n2c7ffa01xa9d2bea20c90c056@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

Le Thu, 19 Jul 2007 00:48:07 +0200, "Till Varoquaux"
<till.varoquaux@gmail.com> a écrit :
> Ouch,
> 
> I beg to differ, if you want speed and can work stream (linear
> top-down left-right exploration of the graph), you want an event based
> xml parser. expat is probably one of the fastest (the c library is
> known to be a speed demon). PXP does everything including talking
> klingon and controlling the kitchen sink. It provides an event based
> layer.

I certainly wouldn't recommend xml-light for *every* project where an
XML parser is needed, but look at the OP's requirements :
> > > I am interested in parsing Wiki markup language that has a few
> > > tags, like <pre>...</pre>, <math>...,</math>.
> > > These tags are sparse, meaning that the ratio of number of tags /
> > > number of bytes is low.
On such a simple case, xml-light (which is basically a simple ocamllex
file + a few things to build the syntax tree) should perform quite
well. I know it doesn't handle DTD, etc. but in *that* case, who cares ?

> Ultimately if you are parsing very simple files and are aiming for
> pure speed you could write a simple lexer with ocamllex and use that
> as base layer.

That could be a solution, and (provided the licence you chose for your
project is compatible) you could even use xml-light as an example to
begin with (stripping things you don't need).

Kind regards,
-- 
Gabriel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2007-07-19  6:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-18 21:58 Luca de Alfaro
2007-07-18 22:11 ` [Caml-list] " Gabriel Kerneis
2007-07-18 22:48   ` Till Varoquaux
2007-07-19  6:24     ` Gabriel Kerneis [this message]
2007-07-19  9:02       ` Till Varoquaux
2007-07-19 11:38 ` Richard Jones
2007-07-20  7:01 ` Jon Harrop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1IBPR6-0000rx-I4@kerneis.info \
    --to=gabriel.kerneis@enst.fr \
    --cc=caml-list@yquem.inria.fr \
    --cc=till.varoquaux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).