caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <gerd@gerd-stolpmann.de>
To: Paul Argentoff <argentoff@rtelekom.ru>
Cc: caml-list@inria.fr
Subject: Re: :pxp_evpull notation (was: yet another silly question on PXP)
Date: Sun, 27 Feb 2005 20:05:24 +0100	[thread overview]
Message-ID: <1109531124.5835.12.camel@localhost.localdomain> (raw)
In-Reply-To: <86vf8gk45m.fsf_-_@paul.rtelekom.ru>

Am Freitag, den 25.02.2005, 19:14 +0300 schrieb Paul Argentoff:
> Dear Gerd Stolpmann,
> 
> Let GS = "Gerd Stolpmann" in
>   written_by GS => 
> 
>  GS> See the file doc/PREPROCESSOR which is part of the distribution
>  GS> tarball.
> 
> Thanks again for a reference. My next question is about :pxp_evpull
> notation. Can I make such a construct:
> 
> let pile = <:pxp_evpull<
>              <foo> (: some_fun () :) >>
> 
> where some_fun generates a further "subtree" using the same pxp_evpull
> notation. 

Yes, this works. some_fun is called when the events for the children of
foo are generated. You must have

some_fun : unit -> Pxp_types.event option

and some_fun is repeatedly called until it returns None.

pxp_evpull generates automata where every state returns an event.
External functions like some_fun are represented as loops, i.e. the next
state is the same state when the function returns Some _, and the
following state for None.

For your example, <:pxp_evpull< <foo> (: some_fun () :) >>, the
automaton is:

let _ =
  let _eid = Pxp_dtd.Entity.create_entity_id () in
  let rec _generator =
    let _state = ref 0 in
    fun _arg ->
      match !_state with
        0 ->
          let ev = Pxp_types.E_start_tag ("foo", [], None, _eid) in
          _state := 1; Some ev
      | 1 ->
          begin match some_fun () _arg with
            None -> _state := 2; _generator _arg
          | Some Pxp_types.E_end_of_stream -> _generator _arg
          | Some ev -> Some ev
          end
      | 2 ->
          let ev = Pxp_types.E_end_tag ("foo", _eid) in _state := 3; Some ev
      | 3 -> None
      | _ -> assert false
  in
  _generator

(output generated with "camlp4 -I ... pa_o.cmo pa_op.cmo pcre.cma
unix.cma netstring.cma pxp_pp.cma pr_o.cmo sample.ml")

some_fun can even be another pxp_evtree automaton.

> My task really is to build a converter from a huge (>100M) text file (or
> string Stream.t) to a huge xml file. Of course, I need to do all job with
> lazy streams to avoid out-of-memory exceptions.

Pull parsers are your friend. They were created with such applications
in mind.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------



  reply	other threads:[~2005-02-27 19:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-22 17:07 yet another silly question on PXP Paul Argentoff
2005-02-22 17:34 ` [Caml-list] " Jerome Simeon
2005-02-22 18:25 ` Paul Argentoff
2005-02-22 19:03   ` Gerd Stolpmann
2005-02-24  7:49     ` Paul Argentoff
2005-02-24 12:11     ` Paul Argentoff
2005-02-25  7:35       ` Paul Argentoff
2005-02-25 16:14     ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
2005-02-27 19:05       ` Gerd Stolpmann [this message]
2005-02-28 10:24         ` :pxp_evpull notation Paul Argentoff
2005-02-28 10:39           ` Gerd Stolpmann
2005-02-28 11:00             ` Paul Argentoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1109531124.5835.12.camel@localhost.localdomain \
    --to=gerd@gerd-stolpmann.de \
    --cc=argentoff@rtelekom.ru \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).