caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: "Alexander V. Voinov" <avv@quasar.ipa.nw.ru>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] ocaml-3.05: a performance experience
Date: Sun, 4 Aug 2002 22:45:32 +0200	[thread overview]
Message-ID: <20020804204532.GA9405@ice.gerd-stolpmann.de> (raw)
In-Reply-To: <3D4C965D.775F23DD@quasar.ipa.nw.ru>; from avv@quasar.ipa.nw.ru on Sun, Aug 04, 2002 at 04:50:05 +0200


On 2002.08.04 04:50 Alexander V. Voinov wrote:
> Hi Gerd,
> 
> Gerd Stolpmann wrote:
> > If XML validation is not needed, you could also rewrite your program
> > to use the new event-based parsing in PXP-1.1.90. That would completely
> > avoid to represent the XML tree in memory (and increase the speed, because
> > GC of large memory footprints is expensive).
> 
> thanks again, but it's not yet officially announced, is it? I managed to
> download it, but I didn't find any direct link. Also, it this parsing
> mode mentioned in the manual?

It is experimental code, but event-based parsing will definitely remain
in the parser until the next stable release. Details of the interface may
change, however. (I call a release "stable" when the interface has matured,
and when all regression tests have passed. The experimental releases usually
work, but it is more likely that there is some "overlooked case" in the
code.)

The manual is not yet updated, there is only a description in the mli file,
and a small example. In particular, there is type 

type event =
  | E_start_doc of (string * bool * dtd)
  | E_end_doc
  | E_start_tag of (string * (string * string) list * Pxp_lexer_types.entity_id)
  | E_end_tag   of (string * Pxp_lexer_types.entity_id)
  | E_char_data of  string
  | E_pinstr of (string * string)
  | E_comment of string
  | E_position of (string * int * int)
  | E_error of exn
  | E_end_of_stream

and a function is called back for every of these events. For example, for

<A x="1">Q<B>R</B>S</A>

you would get the events

E_start_doc("1.0",false,dtd)
E_start_tag("A", ["x", "1"], ent_a)
E_char_data "Q"
E_start_tag("B", [], ent_b)
E_char_data "R"
E_end_tag("B", ent_b)
E_char_data "S"
E_end_tag("A", ent_a)

It is already checked that the document is well-formed, so for end E_end_tag
there is always a matching E_start_tag.

Because the parser "pushes" the events to the application, this is a so-called
"push parser". There are plans for a "pull parser", too (the application calls
a next_event function to get the events), as this would allow to create
streams of XML events.

Gerd
-- 
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 45             
64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de
Germany                     
----------------------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2002-08-04 20:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-02  3:33 Alexander V. Voinov
2002-08-03 12:33 ` Gerd Stolpmann
2002-08-03 17:27   ` [Caml-list] OCAMLRUNPARAM=b David Fox
2002-08-04  2:50   ` [Caml-list] ocaml-3.05: a performance experience Alexander V. Voinov
2002-08-04 20:45     ` Gerd Stolpmann [this message]
2002-08-05 15:18       ` John Max Skaller
2002-08-05 16:24         ` Mike Lin
2002-08-05 16:53           ` Alexander V.Voinov
2002-08-06  3:22           ` John Max Skaller
2002-08-06 13:24             ` Mike Lin
2002-08-06 11:10           ` Noel Welsh
2002-08-06 12:56             ` Andreas Rossberg
2002-08-04 18:06 Damien Doligez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020804204532.GA9405@ice.gerd-stolpmann.de \
    --to=info@gerd-stolpmann.de \
    --cc=avv@quasar.ipa.nw.ru \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).