caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Till Varoquaux" <till.varoquaux@gmail.com>
To: "Gabriel Kerneis" <gabriel.kerneis@enst.fr>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Fast XML parser
Date: Thu, 19 Jul 2007 00:48:07 +0200	[thread overview]
Message-ID: <9d3ec8300707181548n2c7ffa01xa9d2bea20c90c056@mail.gmail.com> (raw)
In-Reply-To: <E1IBHkI-0003Nt-Nj@kerneis.info>

Ouch,

I beg to differ, if you want speed and can work stream (linear
top-down left-right exploration of the graph), you want an event based
xml parser. expat is probably one of the fastest (the c library is
known to be a speed demon). PXP does everything including talking
klingon and controlling the kitchen sink. It provides an event based
layer.
I have found Xml-light to be the simplest parser. Alas, it is so
simple it is far from implementing the full XML 1.1 specification.
This often isn't an issue since most XML files are written in a very
small subset of what the language.

Ultimately if you are parsing very simple files and are aiming for
pure speed you could write a simple lexer with ocamllex and use that
as base layer.

On 7/19/07, Gabriel Kerneis <gabriel.kerneis@enst.fr> wrote:
> Le Wed, 18 Jul 2007 14:58:35 -0700, "Luca de Alfaro"
> <luca@dealfaro.org> a écrit :
> > I am interested in parsing Wiki markup language that has a few tags,
> > like <pre>...</pre>, <math>...,</math>.
> > These tags are sparse, meaning that the ratio of number of tags /
> > number of bytes is low.
> > I would like, given a string (or a stream) with such tags, to parse
> > it as fast as possible.  Efficiency is a primary consideration, and
> > so is simplicity of the implementation.
> > Do you have any advice about the library I should be using?
>
> You want it simple, you want it light : Xml-light.
>
> Regards,
> --
> Gabriel
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>


-- 
http://till-varoquaux.blogspot.com/


  reply	other threads:[~2007-07-18 22:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-18 21:58 Luca de Alfaro
2007-07-18 22:11 ` [Caml-list] " Gabriel Kerneis
2007-07-18 22:48   ` Till Varoquaux [this message]
2007-07-19  6:24     ` Gabriel Kerneis
2007-07-19  9:02       ` Till Varoquaux
2007-07-19 11:38 ` Richard Jones
2007-07-20  7:01 ` Jon Harrop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d3ec8300707181548n2c7ffa01xa9d2bea20c90c056@mail.gmail.com \
    --to=till.varoquaux@gmail.com \
    --cc=caml-list@yquem.inria.fr \
    --cc=gabriel.kerneis@enst.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).